add `flush_broadcast` and `tlbsync` functions #403

Freax13 · 2023-02-11T14:02:54Z

flush_broadcast can be used to broadcast TLB flushes to all CPU cores and tlbsync can be used to wait for those flushes to be done.

Freax13 · 2023-02-11T14:05:56Z

src/instructions/tlb.rs

+
+/// Invalidates TLB entry(s) with Broadcast.
+///
+/// # Safety
+///
+/// This function is unsafe as it requires CPUID.(EAX=8000_0008H, ECX=0H):EBX.INVLPGB
+/// to be 1 and count to be less than or equal to CPUID.(EAX=8000_0008H, ECX=0H):EDX[0..=15].
+#[inline]
+pub unsafe fn flush_broadcast<S>(
+    va_and_count: Option<(Page<S>, u16)>,
+    pcid: Option<Pcid>,
+    asid: Option<u16>,
+    include_global: bool,
+    final_translation_only: bool,
+    include_nested_translations: bool,
+) where
+    S: NotGiantPageSize,


Do we want to use the builder pattern for this? There are an awful lot of parameters and this doesn't seem very future proof in case AMD adds more options.

I agree that we want some sort of builder or #[non_exhaustive] struct to be used for this.

A builder pattern would also better reflect:

What reasonable defaults should be

What parameters are optional

This can also be used to make some parameters unsafe.

tlendacky · 2023-02-11T14:48:03Z

If count is greater than the CPUID value (0x80000008_EDX[15:0]), you can issue the instruction in a loop and then perform a single TLBSYNC.

Freax13 · 2023-02-11T15:01:40Z

If count is greater than the CPUID value (0x80000008_EDX[15:0]), you can issue the instruction in a loop and then perform a single TLBSYNC.

Ah, good to know, I was wondering about that, thank you!

I don't think we want to execute cpuid inside that function to figure out how often we have to loop though: There is precedence for relying on the caller to ensure an instruction is valid to execute (e.g. see flush_pcid in this crate or _rdrand64_step inside the standard library). Also in my particular use-case, I want to execute these instructions inside an SEV-SNP VM where getting cpuid values is not trivial (the cpuid instruction still exists, but may be slow).

tlendacky · 2023-02-12T14:59:48Z

If count is greater than the CPUID value (0x80000008_EDX[15:0]), you can issue the instruction in a loop and then perform a single TLBSYNC.

Ah, good to know, I was wondering about that, thank you!

I don't think we want to execute cpuid inside that function to figure out how often we have to loop though: There is precedence for relying on the caller to ensure an instruction is valid to execute (e.g. see flush_pcid in this crate or _rdrand64_step inside the standard library). Also in my particular use-case, I want to execute these instructions inside an SEV-SNP VM where getting cpuid values is not trivial (the cpuid instruction still exists, but may be slow).

Yes, you definitely don't want to be executing CPUID each time. In an SEV-SNP VM it will generate a #VC exception, which will then use the CPUID page to satisfy the instruction without exiting to the hypervisor, at least.

I haven't looked closely at the crate, is there any way to set/save some kind of initialization state that can be referenced?

Freax13 · 2023-02-12T18:03:25Z

If count is greater than the CPUID value (0x80000008_EDX[15:0]), you can issue the instruction in a loop and then perform a single TLBSYNC.

Ah, good to know, I was wondering about that, thank you!
I don't think we want to execute cpuid inside that function to figure out how often we have to loop though: There is precedence for relying on the caller to ensure an instruction is valid to execute (e.g. see flush_pcid in this crate or _rdrand64_step inside the standard library). Also in my particular use-case, I want to execute these instructions inside an SEV-SNP VM where getting cpuid values is not trivial (the cpuid instruction still exists, but may be slow).

Yes, you definitely don't want to be executing CPUID each time. In an SEV-SNP VM it will generate a #VC exception, which will then use the CPUID page to satisfy the instruction without exiting to the hypervisor, at least.

I haven't looked closely at the crate, is there any way to set/save some kind of initialization state that can be referenced?

We do something kind of like that for the rdrand instruction:

x86_64/src/instructions/random.rs

Lines 3 to 19 in 642956d

    
           #[derive(Copy, Clone, Debug)] 
        
           /// Used to obtain random numbers using x86_64's RDRAND opcode 
        
           pub struct RdRand(()); 
        
           impl RdRand { 
        
               /// Creates Some(RdRand) if RDRAND is supported, None otherwise 
        
               #[inline] 
        
               pub fn new() -> Option<Self> { 
        
                   // RDRAND support indicated by CPUID page 01h, ecx bit 30 
        
                   // https://en.wikipedia.org/wiki/RdRand#Overview 
        
                   let cpuid = unsafe { core::arch::x86_64::__cpuid(0x1) }; 
        
                   if cpuid.ecx & (1 << 30) != 0 { 
        
                       Some(RdRand(())) 
        
                   } else { 
        
                       None 
        
                   } 
        
               }

For rdrand we cache that the processor is able to execute it by creating an instance of a type that can only be created when the processor supports rdrand, if the user keeps that instance around they don't need to check again. We could do the same for invlpgb and tlbsync.

Freax13 · 2023-02-12T20:27:34Z

Implementing the loop for this turns out to be less trivial than I initially thought. Two things I'm not completely certain about:

using invlpgb with a count of zero still flushes one page. Can the max count reported in the cpuid leaf also be zero? I tried looking in the PPRs, but most processor don't seem to support the instructions. EPYC Milan supports it and always has a max count of 7 according to the PPR. The PPR for Family 19h Model 11h Revision B1 also has a fixed max count of 7 (I'm not sure which processors that actually applies to, but I think something with the either Zen3+ or Zen4 microarchitecture).
what happens if we flush an address right before the gap in the address space with a count of more than one i.e.0x7fff_ffff_f000 with a count of two? I assume that it only flushes the address in the lower half in the address space and doesn't automatically jump the gap in the address space. Either way we need to be careful.

tlendacky · 2023-02-13T16:41:35Z

Implementing the loop for this turns out to be less trivial than I initially thought. Two things I'm not completely certain about:

using invlpgb with a count of zero still flushes one page. Can the max count reported in the cpuid leaf also be zero? I tried looking in the PPRs, but most processor don't seem to support the instructions. EPYC Milan supports it and always has a max count of 7 according to the PPR. The PPR for Family 19h Model 11h Revision B1 also has a fixed max count of 7 (I'm not sure which processors that actually applies to, but I think something with the either Zen3+ or Zen4 microarchitecture).

fam19h/model11h is Genoa.

Yes, theoretically, the CPUID could report 0 for the maximum count, in which case you would only be able to do a single page at a time (#GP if the ECX[15:0] > maximum page count supported).

what happens if we flush an address right before the gap in the address space with a count of more than one i.e.0x7fff_ffff_f000 with a count of two? I assume that it only flushes the address in the lower half in the address space and doesn't automatically jump the gap in the address space. Either way we need to be careful.

I believe that the address would not be canonicalized, and would then just be a miss in the TLB since no addresses would match.

src/instructions/tlb.rs

josephlr · 2023-02-14T02:21:28Z

Also, for the ASID related instructions, it seems like we would want some sort of common Asid abstraction (like what we currently do for Pcid). This would also allow us to support something like INVLPGA in the future.

Freax13 · 2023-02-14T09:30:46Z

Also, for the ASID related instructions, it seems like we would want some sort of common Asid abstraction (like what we currently do for Pcid). This would also allow us to support something like INVLPGA in the future.

AFAICT the reason that we have a Pcid type is that Pcid always has to be a 12-bit integer and Rust doesn't provide that. For ASID the limit is not a fixed constant: The maximum supported value is reported CPUID_Fn8000000A_EBX as a 32-bit integer, INVLPGA also takes a 32-bit ASID in ecx and the ASID field in the VMCB is also 32-bit. INVLPGB is the outlier here as it only takes a 16-bit ASID. In pratice a common maximum supported value seems to be 0x8000. There don't seem to be any bounds checks for INVLPGA, but INVLPGB will cause a #GP if the ASID is out of range.

With that said I'm not sure what size the ASID type should be and how we should enforce the limit given by the CPUID function.

Perhaps we should start keeping these limits in global variables so that we can properly enforce them without executing cpuid every time.

This allows us to these functions internally without requiring the nightly `step_trait` feature.

phil-opp

Thanks a lot for tackling this! I left some superficial comments on the API. I didn't have time to look through the flush_broadcast implementation, but the rest looks good to me.

src/addr.rs

src/structures/paging/page.rs

src/instructions/tlb.rs

This avoids any confusion between the trait methods and inherent methods.

This works for all methods except `pages` which changes the type.

Freax13 added 2 commits February 11, 2023 15:00

add flush_broadcast

72a28bd

add tlbsync

54190d5

Freax13 requested a review from phil-opp February 11, 2023 14:02

Freax13 commented Feb 11, 2023

View reviewed changes

Freax13 requested a review from josephlr February 11, 2023 14:06

Freax13 marked this pull request as draft February 13, 2023 09:43

josephlr reviewed Feb 14, 2023

View reviewed changes

src/instructions/tlb.rs Outdated Show resolved Hide resolved

use nostack instead of nomem

c23357f

Freax13 force-pushed the feature/new-flush branch 3 times, most recently from 8f3a412 to 5e5e328 Compare February 14, 2023 11:08

Freax13 added 2 commits February 14, 2023 12:10

move parts of the Step impl into normal functions

83be8d7

This allows us to these functions internally without requiring the nightly `step_trait` feature.

move to a builder pattern

bae6119

Freax13 force-pushed the feature/new-flush branch from 5e5e328 to bae6119 Compare February 14, 2023 11:10

Freax13 marked this pull request as ready for review February 23, 2023 13:09

Freax13 requested a review from josephlr March 8, 2023 11:59

phil-opp approved these changes Mar 12, 2023

View reviewed changes

src/addr.rs Outdated Show resolved Hide resolved

src/structures/paging/page.rs Outdated Show resolved Hide resolved

src/instructions/tlb.rs Outdated Show resolved Hide resolved

src/instructions/tlb.rs Outdated Show resolved Hide resolved

src/instructions/tlb.rs Show resolved Hide resolved

Freax13 added 5 commits March 14, 2023 12:49

rename the inherent step methods

ee10753

This avoids any confusion between the trait methods and inherent methods.

use mutable references in InvlpgbFlushBuilder

28f9511

This works for all methods except `pages` which changes the type.

fix off-by-one for ASID check

861bbbf

add Display impl for AsidOutOfRangeError

3ecd11c

make fields public and add documentation

0d55f01

add reference to flush_broadcast

5cf5f86

Freax13 requested a review from phil-opp March 14, 2023 12:33

Freax13 merged commit 8f0c4c0 into rust-osdev:master Sep 15, 2023

Freax13 mentioned this pull request Sep 15, 2023

Release v0.14.11 #437

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `flush_broadcast` and `tlbsync` functions #403

add `flush_broadcast` and `tlbsync` functions #403

Freax13 commented Feb 11, 2023 •

edited

Freax13 Feb 11, 2023

josephlr Feb 14, 2023

Freax13 Feb 14, 2023

tlendacky commented Feb 11, 2023

Freax13 commented Feb 11, 2023

tlendacky commented Feb 12, 2023

Freax13 commented Feb 12, 2023

Freax13 commented Feb 12, 2023

tlendacky commented Feb 13, 2023

josephlr commented Feb 14, 2023

Freax13 commented Feb 14, 2023

phil-opp left a comment

add flush_broadcast and tlbsync functions #403

add flush_broadcast and tlbsync functions #403

Conversation

Freax13 commented Feb 11, 2023 • edited

Freax13 Feb 11, 2023

Choose a reason for hiding this comment

josephlr Feb 14, 2023

Choose a reason for hiding this comment

Freax13 Feb 14, 2023

Choose a reason for hiding this comment

tlendacky commented Feb 11, 2023

Freax13 commented Feb 11, 2023

tlendacky commented Feb 12, 2023

Freax13 commented Feb 12, 2023

Freax13 commented Feb 12, 2023

tlendacky commented Feb 13, 2023

josephlr commented Feb 14, 2023

Freax13 commented Feb 14, 2023

phil-opp left a comment

Choose a reason for hiding this comment

add `flush_broadcast` and `tlbsync` functions #403

add `flush_broadcast` and `tlbsync` functions #403

Freax13 commented Feb 11, 2023 •

edited