Consider the following code:
int main() { struct s { int m0; int m1; int m2; }; struct s *p = NULL; int a = (int) &((struct s *)p)->m2; // crash ?? int b = (int) &((struct s *)0)->m2; // crash ?? int c = (int) p->m2; // definitely dead return 0; }
So, do you think the above two highlighted lines will crash the program? Well, at least to me, it will, since they’re dereferencing null pointers. Take, int b = (int) &((struct s *)0)->m2;
, for example, we first dereference the zero pointer to get the member m2
, and then obtain its address. Right? This is how we literally read &p_struct->member
.
However, this is not how the compiler interprets it. The compiler is a very cool and knowledgeable guy who is assumed to know everything. So the compiler knows the offset of each member in a structure, and he says, “Well, why do I even care for the values of those members (dereferencing)? To obtain the address of a member in a structure, I just need to add the offset of the member to the address of the structure: &p_struct->member := p_struct + offset(member)
.”
With such ability, the compiler now can generate assembly code
6 { 0x0000000000001129 <+0>: endbr64 0x000000000000112d <+4>: push %rbp 0x000000000000112e <+5>: mov %rsp,%rbp 7 struct s { 8 int m0; 9 int m1; 10 int m2; 11 }; 12 13 struct s *p = NULL; 0x0000000000001131 <+8>: movq $0x0,-0x8(%rbp) 14 int a = (int) &((struct s *)p)->m2; // okay 0x0000000000001139 <+16>: mov -0x8(%rbp),%rax 0x000000000000113d <+20>: add $0x8,%rax 0x0000000000001141 <+24>: mov %eax,-0x14(%rbp) 15 int b = (int) &((struct s *)0)->m2; // okay 0x0000000000001144 <+27>: movl $0x8,-0x10(%rbp) 16 int c = (int) p->m2; // dead 0x000000000000114b <+34>: mov -0x8(%rbp),%rax 0x000000000000114f <+38>: mov 0x8(%rax),%eax # segfault, since it tries to access invalid memory address 0x8 (page 0) 0x0000000000001152 <+41>: mov %eax,-0xc(%rbp) 17 18 return 0; 0x0000000000001155 <+44>: mov $0x0,%eax 19 } 0x000000000000115a <+49>: pop %rbp 0x000000000000115b <+50>: ret
We can see that there are no dereferences whatsoever when we try to get the address of a member in a structure, just the addition of the member offset to the address of the structure.
Therefore, as a special case with the structure address being zero, &((TYPE *)0)->MEMBER
yields the exact offset of the member in a structure, which accounts for why the macro
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
works in the Linux kernel.