eBPF Verifier

eBPF Verifier checks two things:

DAG to disallow loops and other CFG validation - detect unreachable instructions
Starts with first instructions, decends all possible paths
- simulates execution of every instruction and observes state change of registers and stack
- At program start, R1 contains a pointer to a bpf context with type: PTR_TO_CTX, where PTR_TO_CTX is one of the allowed BPF contexts
- After a function call, R1-R5 are cleared
- R0 holds return type
- R6-R9 are Callee Saved

Consider the following examples:

The code snippet below would be rejected by the verifier because R2 was never written to, thus R2 is unreachable at the start of the program

bpf_mov R0 = R2
bpf_exit

The code below would be accepted. After the function call bpf_call foo, R1-R5 would be cleared; however R6-R9 are callee saved so they retain their data. R0 will have the return type. If, however, R6 was replaced with R1, the code would be rejected because R1 would be cleared after the call to foo, thus bpf_mov R0 = R1 wouldn’t work because R1 would be considered unreachable.

bpf_mov R6 = 1
bpf_call foo
bpf_mov R0 = R6
bpf_exit

Load/Store Verification

Loads include an implicit call to is_valid_access() callback verifier, which checks to make sure a load takes places in a memory location that can be accessed for reading.

bpf_ld R0 = *(u32 *)(R6 + 8)

If R6 points to PTR_TO_CTX, then it’s valid if the verifier can confirm that offset 8 of size 4 bytes is a readable region of memory. If R6=PTR_TO_STACK and access is aligned and within stack bounds, then it would be valid.

The stack boundary would be: [-MAX_BPF_STACK, 0)

Function calls

Allowed function calls are customized with bpf_verifier_ops->get_func_proto() - verifier checks that registers match arg constraints

- Register R0 is set to the return type of the funciton

Function calls is a main mechanism in eBPF, making it extensible.

Socket filters (i.e., set via bpf_verifier->get_func_proto() ) may allow programs call one particular set of functions, while tracing filters may permit calls to an entirely different set.

The verifier guarantees functions are called with the proper arguments, by validating argument constriants.

eBPF verifier must track the range of possible values in each register and stack slot to determine if an eBPF program is safe.

This is accomplished using struct bpf_reg_state defined in linux/bpf_verifier.h which unifies tracking of scalar and pointer values.

- Register state can be one of the following:
    - NOT_INIT
    - SCALAR_VALUE
    - Pointer type (e.g., PTR_TO_CTX)
        - for full list see docs

Register Value Tracking

A pointer can be to the BASE (e.g., baseof PTR_TO_CTX) or to a variable or fixed offset. Fixed offsets are used when and exactly known value (e.g., immediate operand) is added to a pointer. Variable offsets are used when the exact value is not known.

Verifiers knowledge of unknown/variable offsets

min/max values as signed and unsigned
knowledge of the values of individual bits (see docs)

PTR_TO_PACKETS

PTR_TO_PACKET, when using a variable offset, has an ‘ID’ which is the same for all pointers sharing that same variable offset.

This allows for packet range checking:

Consider an example where the variable offset is added to register A, where A=PTR_TO_PACKET

If A is copied to B, followed by a constant of 4 added to A, both registers have the same ‘ID’ because they used the same variable offset. However, A will now have a fixed offset of 4.
If, after adding the fixed offset to A, A is bounds-checked and found to have a valid length (i.e., < PTR_TO_PACKET_END), then it is known that register B has known to have a safe range of at least 4 bytes.

Example:

1:  r4 = *(u32 *)(r1 +80)  /* load skb->data_end */
2:  r3 = *(u32 *)(r1 +76)  /* load skb->data */
3:  r5 = r3
4:  r5 += 14
5:  if r5 > r4 goto pc+16
R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
6:  r0 = *(u16 *)(r3 +12) /* access 12 and 13 bytes of the packet */

Here, we have a two-byte load: 1 byte loaded to r4 and another to r3. Literrally, the load is: - Grab the 80th 4-byte value from r1 and copy to r4 ( r4 = *(u32 *)(r1 +80)) -> data_end - Grab the 76th 4-byte value from r1 and copy to r3 -> data

Later we see r3 copied to r5, then 14 added to r5. Register r5 has a fixed offset of 14 the bounds check is r5 > r4. Recall that r4 is data_end; if r5 > r4 then we know that the packet stored in r3, does NOT have at least 14 directly accessible bytes. If r5 < r4, then we know that, from data (start) there are at least 14 bytes directly accessible.

r3 has an accessible range [r3, r3 + 14) r5 has an accessible range [r5, r5 + 14 - 14) because r5 has off=14, meaning an addition (constant) offset of 14 was added to it.

PTR_TO_PACKETS (Cont’d)

Looking at the code above, it may be useful to note that, on most systems, the packet pointer is 2 bytes after 4-byte alignment. If a program adds 14 bytes to the jump over ethernet header, then reads IHL and adds (IHL * 4), the resulting pointer will ahve a variable offset known to be 4n + 2 for some n.

This means adding the two bytes for NET_IP_ALIGN, give 4-byte alignment, making word-sized accesses through that pointer safe.

IHL stands for Internet Header Length, and indicates the number of 4-byte words in the IPv4 header. IHL is 4 bits, so it’s always a multiple of 4. Linux kernel uses NET_IP_ALIGN to keep the l3 packet header aligned after the ethernet header; on most systems this is 2-bytes. Packet pointer refers to skb->data, which is known to be 2 bytes after the Ethernet header (ETH_HEADER + 2-byte NET_IP_ALIGN = packet base pointer (data)).
Discrete Mathematics: The IHL value specifies the number of 4-byte blocks in IPv4 Header. If we multiply that value by 4, we guarantee that IHL is 4-byte aligned. It must be 4-byte aligned because IHL is simply counting the number of 32-bit (4-byte) words/blocks in the IPv4 header.
If IHL = 5; 5 * 4 = 20 bytes, which is the minimum IP headers size. IHL=15, or 15 * 4, is the maximum ip header size—60 bytes.
Since IHL * 4 is always a multiple of 4, advancing the pointer will never break 4-byte alignment. Note: IHL * 4 preserves the 4-byte alignment, it does not cause it.
So, skb->data points to the Ethernet header (start of L2), but NET_IP_ALIGN, on most systems, will means skb->data is typically not 4-byte aligned. NET_IP_ALIGN reserves space, usually 2 bytes. A 14-byte jump will skip the Ethernet Header and land on the IP header.

The kernel sets up skb->data so that IP headers land on good boundaries after skipping L2. This is expressed with NET_IP_ALIGN, which is typically 2, but sometimes 0. (x82 systems use 2).

Consequently, verifier knows that this base (by base I mean skb->data) is 2 bytes past a 4-byte boundary. It knows it’s 2 bytes past because of NET_IP_ALIGN and it knows that this is a 4-byte boundary because of the standard. Thus, skb->data is 2 (mod 4), which is 4n + 2 ( 2 divided by 4 is 0 because 4 does not go into 2, leaving a remainder of 2).

Advancing the pointer

P = skb->data; // Because of NET_IP_ALIGN, verifier knows 2 bytes past 4-byte aligned boundary P += 14; // Skip the Ethernet header to arrive at IP header // P += 14 with 4n + 2 alignment gives, 14 + 2 = 16 -> 0 (mod 4) making this 4n aligned

// IHL has count of 4-byte words (32 bits) in IP Header; Min = 5, Max = 15. Multiplying that count by 4 honors the 4-byte alignment. IHL * 4, will allow us to skip the IPv4 Header to get to the transport Header.

P += IHL * 4 // Arrive at IPv4 Header, advancing the pointer IHL * 4 moves p to the Transport header.

Revisiting Simple Direct packet access

1:  r4 = *(u32 *)(r1 +80)  /* load skb->data_end */
2:  r3 = *(u32 *)(r1 +76)  /* load skb->data */
3:  r5 = r3
4:  r5 += 14
5:  if r5 > r4 goto pc+16
R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
6:  r0 = *(u16 *)(r3 +12) /* access 12 and 13 bytes of the packet */

Register r4 has skb->data_end, which is the end of the Ethernet header and register r3 is skb->data, which is the start of the Ethernet header. Register r3 is copied to r5; r5 pointer is advanced by 14 bytes, which now points to the IP header. If r5 > r4 is effectively checking that the jump over the ethernet header isn’t pointing to a byte that is beyond skb->data_end, which is the last bit (or byte, idk) of the packet pointer. If r5 is greater than r4, we’ve jumped too far and we won’t be able to access the data properly.

r4 = *(u8 *)(r3 + 12) // r4 = inv(umax=255, var_off=0x00…0xff) r4 *= 14 // the operator *= makes r4 type SCALAR_VALUE, 14 = 1110 = 0xfe // r4=inv(umax=3570, var_off=0x00…0xfffe) 0xff *= 0xfe = 0xfffe

High Performance Computing: eBPF Verifier

eBPF Verifier

eBPF Verifier checks two things:

Consider the following examples:

Load/Store Verification

Function calls

Register Value Tracking

Verifiers knowledge of unknown/variable offsets

PTR_TO_PACKETS

PTR_TO_PACKETS (Cont’d)

Advancing the pointer

Revisiting Simple Direct packet access

Exploring Linux Cgroups

Cgroup initialization

Call stack

cgroup Macro Expansion

Linux kmalloc memory …

Allocating Memory (p. 213)

Kmalloc

Linux Bootloader

Booloaders on i386/x86_64