Zero Pointer Dereference, Huh?

Consider the following code:

int main()
{
    struct s {
        int m0;
        int m1;
        int m2;
    };

    struct s *p = NULL;
    int a = (int) &((struct s *)p)->m2;  // crash ??
    int b = (int) &((struct s *)0)->m2;  // crash ??
    int c = (int) p->m2;  // definitely dead

    return 0;
}

So, do you think the above two highlighted lines will crash the program? Well, at least to me, it will, since they’re dereferencing null pointers. Take, int b = (int) &((struct s *)0)->m2;, for example, we first dereference the zero pointer to get the member m2, and then obtain its address. Right? This is how we literally read &p_struct->member.

However, this is not how the compiler interprets it. The compiler is a very cool and knowledgeable guy who is assumed to know everything. So the compiler knows the offset of each member in a structure, and he says, “Well, why do I even care for the values of those members (dereferencing)? To obtain the address of a member in a structure, I just need to add the offset of the member to the address of the structure: &p_struct->member := p_struct + offset(member).”

With such ability, the compiler now can generate assembly code

6       {
   0x0000000000001129 <+0>:     endbr64
   0x000000000000112d <+4>:     push   %rbp
   0x000000000000112e <+5>:     mov    %rsp,%rbp

7           struct s {
8               int m0;
9               int m1;
10              int m2;
11          };
12
13          struct s *p = NULL;
   0x0000000000001131 <+8>:     movq   $0x0,-0x8(%rbp)

14          int a = (int) &((struct s *)p)->m2;  // okay
   0x0000000000001139 <+16>:    mov    -0x8(%rbp),%rax
   0x000000000000113d <+20>:    add    $0x8,%rax
   0x0000000000001141 <+24>:    mov    %eax,-0x14(%rbp)

15          int b = (int) &((struct s *)0)->m2;  // okay
   0x0000000000001144 <+27>:    movl   $0x8,-0x10(%rbp)

16          int c = (int) p->m2;  // dead
   0x000000000000114b <+34>:    mov    -0x8(%rbp),%rax
   0x000000000000114f <+38>:    mov    0x8(%rax),%eax # segfault, since it tries to access invalid memory address 0x8 (page 0)
   0x0000000000001152 <+41>:    mov    %eax,-0xc(%rbp)

17
18          return 0;
   0x0000000000001155 <+44>:    mov    $0x0,%eax

19      }
   0x000000000000115a <+49>:    pop    %rbp
   0x000000000000115b <+50>:    ret

We can see that there are no dereferences whatsoever when we try to get the address of a member in a structure, just the addition of the member offset to the address of the structure.

Therefore, as a special case with the structure address being zero, &((TYPE *)0)->MEMBER yields the exact offset of the member in a structure, which accounts for why the macro

#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)

works in the Linux kernel.

Zero Pointer Dereference, Huh?

Leave a comment

Cancel reply