btf: Take instruction size into account when handling poison relocation #1351

dylandreimerink · 2024-02-19T13:38:07Z

A CO-RE relocation might be poisoned, in such cases we write a "bogus" instruction to the relocation target. This can still result in working code if that instruction turns out to be in a dead code branch due to other CO-RE logic.

Currently this bogus instructions is always a function call to a non-existing function. This is a problem when the original instruction is a dword load immediate, which takes 2 instructions worth of space. When we replace this with a function call, we shrink the instruction stream which throws off offsets and instruction metadata.

So this commit makes makes the CO-RE logic check if we are dealing with a dword load immediate, and if so, we replace it with a dword imm load of 0xbad2310 to R10, which is illigal and will trip the verifier if evaluated.

Fixes: #1348

lmb · 2024-02-20T14:55:27Z

Thanks for diagnosing this!

Does libbpf do the same write into R10 trick, or something different?
Can you add a test?

P.S. The reason we don't adjust automatically for this is because the offset is encoded in the instruction stream. Does the jump instruction in the ELF carry a relocation to an LBB_ block? We strip those here:

ebpf/elf_reader.go

Lines 258 to 260 in e4ec617

    
           // LLVM emits LBB_ (Local Basic Block) symbols that seem to be jump 
        
           // targets within sections, but BPF has no use for them. 
        
           if symType == elf.STT_NOTYPE && elf.ST_BIND(symbol.Info) == elf.STB_LOCAL &&

dylandreimerink · 2024-02-20T15:27:44Z

Does libbpf do the same write into R10 trick, or something different?

I came up with this myself, I didn't think to look at libbpf's approach. They seem to emit 2 call instructions in this case:

if (res->poison) {
poison:
	/* poison second part of ldimm64 to avoid confusing error from
	 * verifier about "unknown opcode 00"
	 */
	if (is_ldimm64_insn(insn))
		bpf_core_poison_insn(prog_name, relo_idx, insn_idx + 1, insn + 1);
	bpf_core_poison_insn(prog_name, relo_idx, insn_idx, insn);
	return 0;
}

We could just copy that.

Can you add a test?

Yea, ofcourse.

Does the jump instruction in the ELF carry a relocation to an LBB_ block?

It does, but I think the current fix is good enough?

bpf_bpf.o:      file format elf64-bpf

Disassembly of section kprobe/tcp_close:

0000000000000000 <kprobe_tcp_close>:
       0:       18 01 00 00 01 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x1 ll
       2:       15 01 03 00 00 00 00 00 if r1 == 0x0 goto +0x3 <LBB0_2>
       3:       18 01 00 00 01 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x1 ll
       5:       63 1a fc ff 00 00 00 00 *(u32 *)(r10 - 0x4) = r1

0000000000000030 <LBB0_2>:
       6:       b7 00 00 00 00 00 00 00 r0 = 0x0
       7:       95 00 00 00 00 00 00 00 exit

lmb · 2024-02-20T15:43:09Z

We could just copy that.

Yeah let's do that. The call instruction is already confusing enough.

A CO-RE relocation might be poisoned, in such cases we write a "bogus" instruction to the relocation target. This can still result in working code if that instruction turns out to be in a dead code branch due to other CO-RE logic. Currently this bogus instructions is always a function call to a non-existing function. This is a problem when the original instruction is a dword load immediate, which takes 2 instructions worth of space. When we replace this with a function call, we shrink the instruction stream which throws off offsets and instruction metadata. So this commit makes makes the CO-RE logic check if we are dealing with a dword load immediate, and if so, we replace it with a dword imm load of 0xbad2310 to R10, which is illigal and will trip the verifier if evaluated. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

Added a test which makes sure we do not mess up offsets when replacing a ld64imm with poisoned instructions. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

lmb

Thanks!

lmb · 2024-02-21T15:38:25Z

From discussion on Slack: going with the write to RFP trick, since that means we can avoid breaking API.

dylandreimerink requested a review from a team as a code owner February 19, 2024 13:38

dylandreimerink added the bug Something isn't working label Feb 19, 2024

dylandreimerink force-pushed the feature/fix-1348 branch 2 times, most recently from e43a484 to 36e4a6b Compare February 21, 2024 11:18

dylandreimerink added 2 commits February 21, 2024 16:23

btf: Make test for ld64imm CO-RE relocations

fe3cc27

Added a test which makes sure we do not mess up offsets when replacing a ld64imm with poisoned instructions. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>

dylandreimerink force-pushed the feature/fix-1348 branch from 36e4a6b to fe3cc27 Compare February 21, 2024 15:24

lmb approved these changes Feb 21, 2024

View reviewed changes

lmb merged commit b24722c into cilium:main Feb 21, 2024
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

btf: Take instruction size into account when handling poison relocation #1351

btf: Take instruction size into account when handling poison relocation #1351

dylandreimerink commented Feb 19, 2024

lmb commented Feb 20, 2024 •

edited

dylandreimerink commented Feb 20, 2024

lmb commented Feb 20, 2024

lmb left a comment

lmb commented Feb 21, 2024

btf: Take instruction size into account when handling poison relocation #1351

btf: Take instruction size into account when handling poison relocation #1351

Conversation

dylandreimerink commented Feb 19, 2024

lmb commented Feb 20, 2024 • edited

dylandreimerink commented Feb 20, 2024

lmb commented Feb 20, 2024

lmb left a comment

Choose a reason for hiding this comment

lmb commented Feb 21, 2024

lmb commented Feb 20, 2024 •

edited