Consider the following code:
int main()
{
struct s {
int m0;
int m1;
int m2;
};
struct s *p = NULL;
int a = (int) &((struct s *)p)->m2; // crash ??
int b = (int) &((struct s *)0)->m2; // crash ??
int c = (int) p->m2; // definitely dead
return 0;
}
So, do you think the above two highlighted lines will crash the program? Well, at least to me, it will, since they’re dereferencing null pointers. Take, int b = (int) &((struct s *)0)->m2;, for example, we first dereference the zero pointer to get the member m2, and then obtain its address. Right? This is how we literally read &p_struct->member.
However, this is not how the compiler interprets it. The compiler is a very cool and knowledgeable guy who is assumed to know everything. So the compiler knows the offset of each member in a structure, and he says, “Well, why do I even care for the values of those members (dereferencing)? To obtain the address of a member in a structure, I just need to add the offset of the member to the address of the structure: &p_struct->member := p_struct + offset(member).”
With such ability, the compiler now can generate assembly code
6 {
0x0000000000001129 <+0>: endbr64
0x000000000000112d <+4>: push %rbp
0x000000000000112e <+5>: mov %rsp,%rbp
7 struct s {
8 int m0;
9 int m1;
10 int m2;
11 };
12
13 struct s *p = NULL;
0x0000000000001131 <+8>: movq $0x0,-0x8(%rbp)
14 int a = (int) &((struct s *)p)->m2; // okay
0x0000000000001139 <+16>: mov -0x8(%rbp),%rax
0x000000000000113d <+20>: add $0x8,%rax
0x0000000000001141 <+24>: mov %eax,-0x14(%rbp)
15 int b = (int) &((struct s *)0)->m2; // okay
0x0000000000001144 <+27>: movl $0x8,-0x10(%rbp)
16 int c = (int) p->m2; // dead
0x000000000000114b <+34>: mov -0x8(%rbp),%rax
0x000000000000114f <+38>: mov 0x8(%rax),%eax # segfault, since it tries to access invalid memory address 0x8 (page 0)
0x0000000000001152 <+41>: mov %eax,-0xc(%rbp)
17
18 return 0;
0x0000000000001155 <+44>: mov $0x0,%eax
19 }
0x000000000000115a <+49>: pop %rbp
0x000000000000115b <+50>: ret
We can see that there are no dereferences whatsoever when we try to get the address of a member in a structure, just the addition of the member offset to the address of the structure.
Therefore, as a special case with the structure address being zero, &((TYPE *)0)->MEMBER yields the exact offset of the member in a structure, which accounts for why the macro
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
works in the Linux kernel.
