Routing Back to C [Part 2]: A Linked List, One Node at a Time

Part 1 set the stage: stack, heap, pointers, and what it means to manage memory by hand. This post picks the smallest structure that actually needs the heap, and builds it from zero.

A linked list is first because every other pointer-based structure (hash tables, trees, graphs) is a twist on the same idea: nodes on the heap, linked by pointers.

By the end we’ll put it all together in a real list.c, compile it, and run it under valgrind to prove there are no leaks. The theory comes first; the practical build is at the end.

The Shape

A linked list is a chain of nodes. Each node holds a value and a pointer to the next node. The last node points at nothing.

head
 │
 ▼
┌───────┐     ┌───────┐     ┌───────┐
│  10   │     │  20   │     │  30   │
├───────┤     ├───────┤     ├───────┤
│ next ─┼──→  │ next ─┼──→  │ next ─┼──→ NULL
└───────┘     └───────┘     └───────┘

Three things to notice:

Each node is its own heap allocation. The boxes are not neighbours in memory. They could be 4 bytes apart or 4 megabytes apart. The next pointer is the only thing that knows.
There is no “list” object. The list is the head pointer plus whatever you reach by following next.
NULL is the stopper. No sentinel, no length field. You walk until you hit NULL.

The Node

typedef struct node {
    int value;
    struct node *next;
} node;

One 4-byte int and one 8-byte pointer. On 64-bit with the usual alignment, the compiler lays it out like this:

       0         4         8                  16
       ├─────────┼─────────┼──────────────────┤
bytes: │  value  │ padding │      next        │
       │  (4 B)  │  (4 B)  │      (8 B)       │
       └─────────┴─────────┴──────────────────┘
       ▲
   start of node

   total: 16 bytes

The 4 bytes of padding are there so next lands on an 8-byte boundary. The value is 4 useful bytes out of 16, the rest is pointer plus padding.

Why the Padding Here?

Different padding from the malloc(1) kind in Part 1. This one is set by the compiler, not the allocator. The rule in one line:

A type of size N wants to sit at an address divisible by N.

So:

int = 4 B → address divisible by 4
pointer on 64-bit = 8 B → address divisible by 8
pointer on 32-bit = 4 B → address divisible by 4

“Pointer on 64-bit wants 8-byte alignment” means the address where the pointer itself lives must be a multiple of 8, so the CPU can load it in one memory transaction. The address the pointer points to has nothing to do with it.

aligned load (1 transaction):        unaligned load (2 transactions):

│────────│────────│────────│         │────────│────────│────────│
0        8        16       24        0        8        16       24
         ▲                               ▲
         address 8, divisible by 8       address 4, straddles words
         CPU grabs one word ✓            CPU reads two, stitches ✗ slow

Lay the fields out naively in declaration order:

offset:  0         4         ?
         ├─────────┼─────────┼
         │  value  │  next?  │   ← next at offset 4
         │  (4 B)  │  (8 B)  │      NOT a multiple of 8
         └─────────┴─────────┘
                    ✗ misaligned

Offset 4 isn’t divisible by 8, so next can’t live there. Compiler slides it to offset 8 and fills the gap with dead bytes, which is the layout you saw above.

Why alignment matters in the first place:

CPU speed. Aligned 8-byte loads are one memory transaction. Unaligned loads can span two cache lines and double the traffic. x86 tolerates it and pays the tax; older ARM would just trap.
Atomics. An atomic load or store on a pointer needs alignment. Unaligned means either illegal or silently split into two operations, which breaks atomicity.
Arrays still work. Struct size rounds up to its largest alignment (here, 8). So node arr[10] places element 1 at offset 16, element 2 at offset 32, each with next on an 8-byte boundary. Without that, every other element would be misaligned.

Flipping the field order doesn’t save bytes:

struct node {
    struct node *next;  // offset 0, 8 B
    int value;          // offset 8, 4 B
};                      // size still 16 B (4 B tail padding)

The padding moves to the tail, but the struct is still 16 B, because the next element in an array still has to start on an 8-byte boundary.

Rule of thumb: sort fields largest-first to push padding to the tail. For mixed sizes, reorder until the total shrinks. Some projects use #pragma pack to disable padding entirely, trading speed for size (network protocols, on-disk layouts).

First cost surfaces already: storing a 4-byte number in a linked list takes 16 bytes of node, plus whatever the allocator adds on top.

An Empty List

An empty list is a null head pointer. No heap, no nodes, nothing to free.

node *head = NULL;

head
 │
 ▼
NULL

head is a stack variable, 8 bytes in the caller’s frame, holding the address 0. NULL is how C says “this pointer points at nothing”. Dereference a NULL pointer and the OS traps you with a segfault.

Creating One Node

One node is one malloc. Ask the allocator for sizeof(node) bytes, set the fields, return the pointer.

node *new_node(int value) {
    node *n = malloc(sizeof(node));
    if (n == NULL) return NULL;   // out of memory
    n->value = value;
    n->next  = NULL;
    return n;
}

Step by step when you call new_node(10):

1. malloc carves a chunk off the heap.

STACK                       HEAP
┌──────────┐               ┌──────┬──────────────────┐
│    n     │               │header│  16 B for node   │
│ (8 B ptr)│─────────────→ │      │ ?? | ?? | ?? ... │
└──────────┘               └──────┴──────────────────┘

The 16 bytes are uninitialised. They hold whatever garbage was last there.

2. Assign fields.

STACK                       HEAP
┌──────────┐               ┌──────┬──────┬──────┬──────┐
│    n     │               │header│  10  │ pad  │ NULL │
│          │─────────────→ │      │ (4 B)│ (4 B)│ (8 B)│
└──────────┘               └──────┴──────┴──────┴──────┘

3. Return the pointer. The stack slot n goes away when new_node returns, but the heap chunk stays. The caller now holds the only pointer to it.

-> is shorthand: n->value is the same as (*n).value. “Follow the pointer, then read the field.” C programmers use -> because writing (*n).value everywhere is ugly.

What `sizeof` Knows, What `malloc` Adds

Two layers of padding stack up around every node, and it helps to keep them separate:

Inside sizeof(node) = 16 B: no new padding from malloc. The 4 B gap between value and next is already baked into sizeof. The allocator doesn’t know fields exist, it just sees a flat “give me 16 bytes” request.

Around your 16 B: yes, malloc adds its own padding. Two kinds:

Header before the chunk (8 B on 64-bit glibc) for size bookkeeping.
Trailing slack to hit the minimum chunk size (32 B on 64-bit glibc) and the 16-B alignment multiple.

 ┌──────────┬──────────────────────────────┬──────────┐
 │ header   │  value | pad | next          │  slack   │
 │  (8 B)   │  4 B   | 4 B | 8 B           │  (8 B)   │
 └──────────┴──────────────────────────────┴──────────┘
 ▲           ▲                               ▲
 chunk       n (what malloc returns)         padding to
 start                                       reach 32 B min

8 B header + 16 B user + 8 B trailing slack = 32 B chunk for a 16 B struct.
A 24 B struct would land exactly on 32 B with no trailing slack.
A 40 B struct would push the chunk to 48 B (next multiple of 16 after header + user).

Values: malloc never touches them. The bytes it returns hold whatever was there from the last chunk in that slot. Reading a field before you set it is undefined behaviour. Use calloc(1, sizeof(node)) if you want zeros:

malloc(sizeof(node))                calloc(1, sizeof(node))
   ┌─────┬─────┬─────┐                 ┌─────┬─────┬─────┐
   │ ??? │ ??? │ ??? │                 │  0  │  0  │ NULL│
   └─────┴─────┴─────┘                 └─────┴─────┴─────┘
   read before set → UB                safe to read

For a node, you’re assigning every field before reading anyway, so malloc is fine. calloc earns its keep when you allocate structs with many fields you don’t always touch, or large buffers you’d otherwise have to memset yourself.

Short version of the whole padding story:

sizeof = compiler padding baked in.
malloc adds header + trailing slack outside your sizeof.
Neither initialises your bytes. Set every field, or use calloc.

Insert at the Head

The fastest insert: make a new node, point it at the old head, make it the new head. Constant time, no walking the list.

void push(node **head, int value) {
    node *n = new_node(value);
    n->next = *head;
    *head   = n;
}

Three things happen. Watch each one on [10 → 20] with push(&head, 5).

State before:

head ─→ ┌────┐     ┌────┐
        │ 10 │──→  │ 20 │──→ NULL
        └────┘     └────┘

Step 1. new_node(5) allocates a fresh node with next = NULL:

head ─→ ┌────┐     ┌────┐
        │ 10 │──→  │ 20 │──→ NULL
        └────┘     └────┘

n   ─→  ┌────┐
        │  5 │──→ NULL
        └────┘

Step 2. n->next = *head;, point the new node at the current head:

head ─→ ┌────┐     ┌────┐
        │ 10 │──→  │ 20 │──→ NULL
        └────┘     └────┘
          ▲
          │
        ┌─┴──┐
n   ─→  │  5 │
        └────┘

5 → 10 → 20 → NULL is now a valid chain, but head still points at 10.

Step 3. *head = n;, move the head pointer to the new node:

head ─→ ┌────┐     ┌────┐     ┌────┐
        │  5 │──→  │ 10 │──→  │ 20 │──→ NULL
        └────┘     └────┘     └────┘

Done. Order matters: if you wrote Step 3 before Step 2, *head = n; n->next = *head; would set n->next to n itself, an infinite loop waiting to happen.

Why `node **`?

push takes a node **head, not a node *head. A pointer to a pointer. To see why, first get the three levels straight.

The mental model: each * walks one step deeper into memory.

& = “give me the address of” (adds a level of indirection)
* = “follow the address, give me what’s there” (removes a level)

So inside push:

head          a pointer to a pointer
  │
  │  *  (follow once)
  ▼
*head         a pointer to a node
  │
  │  *  (follow again)
  ▼
**head        the node itself — value and next fields

Concrete with made-up addresses:

Expression	Type	Value	In words
`head`	`node **`	`0x7FFF0008`	address of the caller’s head variable
`*head`	`node *`	`0x55001000`	address of the first node
`**head`	`node`	`{10, 0x5500...}`	the first node’s bytes (not an address)

* and & are opposites: *(&x) == x, &(*p) == p.

Now the “why **” part.

If you passed node *head, the function would get a copy of the pointer. Changes to the copy don’t touch the caller’s variable.

caller                       push  (takes node *)
┌──────┐                     ┌──────┐
│ head │─→ (old head)        │ head │─→ (old head)   ← same target
└──────┘                     └──────┘

push reassigns its local copy:
┌──────┐                     ┌──────┐
│ head │─→ (old head)        │ head │─→ (new node)   ← only the copy moves
└──────┘                     └──────┘

The caller’s head is unchanged. To change it, push needs its address, not its value. That’s what ** gives you.

caller                       push  (takes node **)
┌──────┐                     ┌───────┐
│ head │  ←──────────────────┤  head │     param holds caller's address
└──┬───┘                     └───┬───┘
   │                             │  *
   ▼                             ▼
 (node)                       *head = n;   ← writes to caller's slot

Inside push, *head = n means: follow the parameter one step to reach the caller’s head variable, and overwrite it with n. The caller’s pointer now points at the new node.

Quick reference for working with node **head:

Change which node head points to → *head = new_node;
Read or change the first node’s fields → (*head)->value, (*head)->next
-> auto-dereferences once, so (*head)->value is the same as (**head).value, just cleaner.
You rarely write **head directly. -> covers the common case.

This pattern shows up every time a C function needs to mutate a caller’s pointer.

Traverse

Walking the list is a for loop with a cursor.

void print_list(node *head) {
    for (node *cur = head; cur != NULL; cur = cur->next) {
        printf("%d ", cur->value);
    }
    printf("\n");
}

cur starts at head and hops along next until it lands on NULL.

iter 1:  cur ─→ [ 5]──→[10]──→[20]──→ NULL    print 5
iter 2:         [ 5]──→[10]──→[20]──→ NULL    print 10
                 cur ───┘
iter 3:         [ 5]──→[10]──→[20]──→ NULL    print 20
                         cur ───┘
iter 4:  cur == NULL → loop exits

No indexing, no length. You only know the list is done when cur hits NULL. This is also why print_list is O(n) while arr[5] is O(1), the array knows where element 5 lives without walking.

Delete a Node by Value

Remove the first node whose value matches. Two pointers: cur for the node under inspection, prev for the one before it. You need prev to re-link around the node you’re about to free.

void delete_value(node **head, int value) {
    node *cur  = *head;
    node *prev = NULL;

    while (cur != NULL && cur->value != value) {
        prev = cur;
        cur  = cur->next;
    }

    if (cur == NULL) return;        // not found

    if (prev == NULL) {
        *head = cur->next;          // removing the head
    } else {
        prev->next = cur->next;     // skip cur
    }

    free(cur);
}

Walk delete_value(&head, 20) on [5 → 10 → 20 → 30].

Start. prev = NULL, cur = head.

prev = NULL
cur ─→ ┌────┐     ┌────┐     ┌────┐     ┌────┐
       │  5 │──→  │ 10 │──→  │ 20 │──→  │ 30 │──→ NULL
       └────┘     └────┘     └────┘     └────┘

Advance until cur->value == 20:

              prev         cur
               │            │
               ▼            ▼
       ┌────┐     ┌────┐     ┌────┐     ┌────┐
       │  5 │──→  │ 10 │──→  │ 20 │──→  │ 30 │──→ NULL
       └────┘     └────┘     └────┘     └────┘

Re-link. prev->next = cur->next;, point 10 directly at 30, skipping 20:

              prev         cur
               │            │
               ▼            ▼
       ┌────┐     ┌────┐     ┌────┐     ┌────┐
       │  5 │──→  │ 10 │──┐  │ 20 │──→  │ 30 │──→ NULL
       └────┘     └────┘  │  └────┘     └────┘
                          │              ▲
                          └──────────────┘

Free. 20 is now unreachable from head. Release its chunk back to the allocator.

       ┌────┐     ┌────┐              ┌────┐
       │  5 │──→  │ 10 │───────────→  │ 30 │──→ NULL
       └────┘     └────┘              └────┘

Two edge cases the code handles:

Value not found. Loop exits with cur == NULL. Return early, no free.
Removing the head. prev stays NULL. Write *head = cur->next to move the head forward before freeing cur.

Order inside the function matters: always read cur->next before free(cur). Once the chunk is freed, any read through cur is undefined.

Freeing the Whole List

Every node was one malloc. Every node needs one free. The tricky part: you can’t free a node and then read its next.

void free_list(node *head) {
    node *cur = head;
    while (cur != NULL) {
        node *next = cur->next;   // read BEFORE free
        free(cur);
        cur = next;
    }
}

Two pointers: cur is the node to free right now, next is the one to jump to. The order of the three lines inside the loop is load-bearing.

Trace on [5 → 10 → 20]:

iter 1:  cur=[ 5]   next=[10]    free( 5)   cur ← [10]
iter 2:  cur=[10]   next=[20]    free(10)   cur ← [20]
iter 3:  cur=[20]   next=NULL    free(20)   cur ← NULL
iter 4:  cur == NULL → exit

What if you swap the order and write free(cur); cur = cur->next;?

free(cur)                      cur->next  (dangling read)
   │                                │
   ▼                                ▼
┌───────┐                       ┌─────────┐
│ freed │  ← chunk released     │ ??????? │  ← reading a freed
└───────┘                       └─────────┘    chunk. undefined.

The chunk has been handed back to the allocator. Reading cur->next now pulls whatever bytes happen to live there, allocator bookkeeping, a zero, another object’s data. In a release build it might even appear to work for a while, then crash somewhere unrelated days later. This is the class of bug that defines C careers.

Setup

If you don’t already have a C toolchain and valgrind, install them from the official sources for your OS:

Linux (gcc): gcc.gnu.org/install or your distro’s package manager.
macOS (clang): Xcode Command Line Tools.
Windows (MSYS2 / MinGW-w64): msys2.org.
Windows (WSL): Install WSL, then follow the Linux instructions inside it.
valgrind: valgrind.org/downloads (Linux and WSL only; on macOS use leaks or AddressSanitizer via -fsanitize=address).

Files You’ll Write

One source file, no headers. Create an empty directory and drop a single list.c inside it:

linked-list/
└── list.c

Everything in this post, the node struct, new_node, push, print_list, delete_value, free_list, and main, goes into that one file, in that order. At the top of list.c, add the two standard-library includes the code uses:

#include <stdio.h>     // printf
#include <stdlib.h>    // malloc, free

// struct + functions + main go here, in the order they appear in the post

Then open a shell inside the linked-list/ folder and compile. On Windows with MSYS2 (recommended in the setup list above), open a regular Command Prompt or PowerShell, cd into your folder, and launch MSYS2 right there:

> cd D:\path\to\linked-list
> C:\msys64\msys2_shell.cmd -ucrt64 -here

The -ucrt64 picks the UCRT64 environment (where gcc lives after you installed the toolchain), and -here tells MSYS2 to open the new shell in the current directory instead of your home folder. Adjust C:\msys64 if you installed MSYS2 somewhere else.

On macOS or Linux, right-click the folder in Finder/Files and pick “Open in Terminal”, or cd into it from any terminal. Once you’re in the right directory (ls should show list.c), compile and run:

$ cc -Wall -Wextra -o list list.c
$ ./list

No build system, no Makefile. One file in, one binary out. If you want to split into list.h and list.c later, that’s a natural exercise once the whole thing runs.

Putting It Together

int main(void) {
    node *head = NULL;

    push(&head, 30);
    push(&head, 20);
    push(&head, 10);

    print_list(head);          // 10 20 30

    delete_value(&head, 20);
    print_list(head);          // 10 30

    free_list(head);
    return 0;
}

Build and run:

$ cc -Wall -Wextra -o list list.c
$ ./list
10 20 30
10 30

Then run it under valgrind:

$ valgrind --leak-check=full ./list
...
All heap blocks were freed -- no leaks are possible
ERROR SUMMARY: 0 errors from 0 contexts

If valgrind reports “definitely lost: N bytes”, you missed a free. If it reports “invalid read”, you used a pointer after freeing it. There is no hiding.

What It Cost

Every operation visible, nothing hidden:

Operation	What you wrote	What a runtime would do
Create	`malloc(sizeof(node))`	hidden in `new` or literal
Insert	re-point two pointers	same, but not in your face
Delete	re-link, then `free`	mark unreachable, GC later
Traverse	walk `next` until NULL	same
Free all	walk and `free` each node	sweep phase of GC

Python’s list.append calls a C malloc underneath. Java’s LinkedList.add allocates a node on the JVM heap. Go’s slice growth reallocates into a fresh array. The heap call is always there. C just makes you write it by name.

Reminder from Part 1: on 64-bit glibc, each malloc(16) reserves at least 32 bytes. A list of a million ints inside nodes costs ~32 MB of heap for ~4 MB of actual numbers. Eight times the data. Every linked-list-heavy hot loop has this bill attached.

What’s Next

Part 3: a hash table. Same malloc/free, same node discipline, but now with an array of buckets, a hash function, and a plan for when two keys land in the same slot. That’s where collision resolution, load factor, and the word “amortized” enter the picture.

Glossary

Node: one element of a linked list. Holds a value and a pointer to the next node.
Head: the first node in a list. Losing the head pointer leaks the whole list.
NULL: the address 0. C uses it to mean “this pointer points at nothing”.
->: follow a pointer and read a field. p->x is (*p).x.
Pointer to a pointer: the address of a pointer variable. Lets a function mutate the caller’s pointer.
Use-after-free: reading or writing a pointer already passed to free. Undefined behaviour.
valgrind: a tool that runs your program and reports memory errors and leaks.