Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra bound checks are not eliminated for pinning #35748

Open
EgorBo opened this issue May 2, 2020 · 11 comments
Open

Extra bound checks are not eliminated for pinning #35748

EgorBo opened this issue May 2, 2020 · 11 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@EgorBo
Copy link
Member

EgorBo commented May 2, 2020

The following "array pinning" syntax

public static void Test(int[] array)
{
    fixed (int* p = array)
        DoWork(p);
}

is a С# 7(?) syntax sugar for:

public static void Test(int[] array)
{
    if (array != null && array.Length > 0)
    {
        fixed (int* p = &array[0]) // array is always not empty at this point
            DoWork(p);
    }
}

As you can see, there is no way this code can throw an out of bounds exception.

Codegen (master):

; Method Test(System.Int32[])
G_M31592_IG01:
       4883EC28             sub      rsp, 40
       33C0                 xor      rax, rax
       4889442420           mov      qword ptr [rsp+20H], rax
G_M31592_IG02:
       48894C2420           mov      gword ptr [rsp+20H], rcx
       4885C9               test     rcx, rcx
       740B                 je       SHORT G_M31592_IG04
G_M31592_IG03:
       488B4C2420           mov      rcx, gword ptr [rsp+20H]
       83790800             cmp      dword ptr [rcx+8], 0    ;; we already check Length here
       7504                 jne      SHORT G_M31592_IG05
G_M31592_IG04:
       33C9                 xor      rcx, rcx
       EB14                 jmp      SHORT G_M31592_IG06
G_M31592_IG05:
       488B4C2420           mov      rcx, gword ptr [rsp+20H]
       83790800             cmp      dword ptr [rcx+8], 0     ;; redundant
       761A                 jbe      SHORT G_M31592_IG08      ;; redundant
       488B4C2420           mov      rcx, gword ptr [rsp+20H]
       4883C110             add      rcx, 16
G_M31592_IG06:
       E8EB60FFFF           call     EgorClass:DoWork(long)
       33C0                 xor      rax, rax
       4889442420           mov      gword ptr [rsp+20H], rax
G_M31592_IG07:
       4883C428             add      rsp, 40
       C3                   ret      
G_M31592_IG08:   ;; not needed
       E872C0485F           call     CORINFO_HELP_RNGCHKFAIL   ;; redundant
       CC                   int3                               ;; redundant
; Total bytes of code: 79

Unfortunately, Assertion prop doesn't optimize the bound check away.

category:cq
theme:pinning
skill-level:expert
cost:extra-large
impact:medium

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels May 2, 2020
@jkotas
Copy link
Member

jkotas commented May 2, 2020

The JIT should be able to figure this out, even without the uint hack. I do not think we would want to teach Roslyn how to do this.

@EgorBo
Copy link
Member Author

EgorBo commented May 3, 2020

Updated the description. Even if I help JIT to convert array.Length != 0 to 0 < (uint)array.Length in morph it doesn't optimize the range check.

@EgorBo
Copy link
Member Author

EgorBo commented May 3, 2020

Example:

    public static void Test1(int[] array)
    {
        fixed (int* p = array)
        {
        }
    }

IR for OptimizeRangeChecks phase:

*************** Starting PHASE Optimize index checks
*************** In OptimizeRangeChecks()
Blocks/trees before phase

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1        [000..005)-> BB03 ( cond )                     i label target 
BB02 [0001]  1       BB01                  0.50     [005..00A)-> BB04 ( cond )                     i idxlen 
BB03 [0002]  2       BB01,BB02             0.50     [00A..00F)-> BB05 (always)                     i label target 
BB04 [0003]  1       BB02                  0.50     [00F..018)                                     i label target idxlen 
BB05 [0004]  2       BB03,BB04             1        [018..01B)        (return)                     i label target 
-----------------------------------------------------------------------------------------------------------------------------------------

------------ BB01 [000..005) -> BB03 (cond), preds={} succs={BB02,BB03}

***** BB01
STMT00000 (IL 0x000...0x002)
N003 (  1,  3) [000003] -A------R---              *  ASG       ref    $80
N002 (  1,  1) [000002] D------N----              +--*  LCL_VAR   ref    V02 loc1         
N001 (  1,  1) [000001] ------------              \--*  LCL_VAR   ref    V00 arg0         u:1 $80

***** BB01
STMT00001 (IL   ???...0x003)
N004 (  5,  5) [000006] ------------              *  JTRUE     void  
N003 (  3,  3) [000005] J------N----              \--*  EQ        int    $100
N001 (  1,  1) [000000] ------------                 +--*  LCL_VAR   ref    V00 arg0         u:1 (last use) $80
N002 (  1,  1) [000004] ------------                 \--*  CNS_INT   ref    null $VN.Null

------------ BB02 [005..00A) -> BB04 (cond), preds={BB01} succs={BB03,BB04}

***** BB02
STMT00005 (IL 0x005...0x008)
N005 (  7,  7) [000019] ---X--------              *  JTRUE     void  
N004 (  5,  5) [000018] J--X---N----              \--*  NE        int    $103
N002 (  3,  3) [000016] ---X--------                 +--*  ARR_LENGTH int    $101
N001 (  1,  1) [000015] ------------                 |  \--*  LCL_VAR   ref    V02 loc1          $140
N003 (  1,  1) [000017] ------------                 \--*  CNS_INT   int    0 $40

------------ BB03 [00A..00F) -> BB05 (always), preds={BB01,BB02} succs={BB05}

------------ BB04 [00F..018), preds={BB02} succs={BB05}

***** BB04
STMT00006 (IL 0x00F...0x017)
N004 (  8, 11) [000035] ---X--------              *  ARR_BOUNDS_CHECK_Rng void   $1c6
N001 (  1,  1) [000021] ------------              +--*  CNS_INT   int    0 $40
N003 (  3,  3) [000034] ---X--------              \--*  ARR_LENGTH int    $104
N002 (  1,  1) [000020] ------------                 \--*  LCL_VAR   ref    V02 loc1          $141

------------ BB05 [018..01B) (return), preds={BB03,BB04} succs={}

***** BB05
STMT00003 (IL 0x018...0x019)
N003 (  1,  3) [000013] -A------R---              *  ASG       ref    $VN.Null
N002 (  1,  1) [000012] D------N----              +--*  LCL_VAR   ref    V02 loc1         
N001 (  1,  1) [000011] ------------              \--*  CNS_INT   ref    null $VN.Null

***** BB05
STMT00004 (IL 0x01A...0x01A)
N001 (  0,  0) [000014] ------------              *  RETURN    void   $200

@AndyAyersMS any idea why ARR_LENGTH (and underlying local) has different VN in BB02 and BB04 ? it should be the same as far as I understand. I guess Roslyn could avoid generating an additional variable for the array.

@AndyAyersMS
Copy link
Member

Suspect this happens because pinned locals are untracked, untracked locals can't be in SSA, and VN relies on SSA for locals. Thus VN doesn't know that the two appearances of V02 refer to the same value.

Unwinding this a bit:

  • Untracked locals can't be in SSA because some of them might be aliased.
  • Pinned locals are untracked because the CLR GC reporting for pinned locals requires that (for gc purposes) they be untracked gc refs (see lvaSortByRefCount).

We could probably unblock optimization by allowing pinned locals to be tracked for SSA purposes but then reported as untracked for GC purposes. Not sure how hard this would be. We'd also need to be careful not to dead store any nulling writes to pinned locals.

@AndyAyersMS AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label May 6, 2020
@AndyAyersMS AndyAyersMS added this to the Future milestone May 6, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 14, 2020
@VSadov
Copy link
Member

VSadov commented Feb 28, 2024

Pinned locals are untracked because the CLR GC reporting for pinned locals requires that (for gc purposes) they be untracked gc refs (see lvaSortByRefCount).

@AndyAyersMS - what part of GC reporting is missing for pinned registers?

  • I think GCInfo is expressive enough to record pinned reg slots. At least the non-x86 kind.
  • The interface with GC does not even know how the variable is stored. In fact we report some registers on Unix as pinned in rare cases due to pal limitations (see ReportRegisterToGC) - to tolerate double-reporting (double-marking is ok, but double-relocating is not, so pinning makes potential double-relocating nonissue).

There could be some other parts that I could be missing.

lvaSortByRefCount says that on non-x86 it may be ok to enregister a pinning ref. Are there other reasons maybe?

@jkotas
Copy link
Member

jkotas commented Feb 28, 2024

There should not be any restrictions in non-x86 GC info encoding that get in the way of optimizing pinned locals.

The problem with pinned locals is that their lifetime has to be extended till end of method or till they are explicitly cleared (or assigned some other value that is pinned implicitly). The JIT takes a shortcut to achieve this lifetime extension by marking them as untracked.

@VSadov
Copy link
Member

VSadov commented Feb 28, 2024

Interestingly a pinning local behaves like an address-exposed (writes to it are globally sideeffecting), although there could be nothing referencing it.

@AndyAyersMS
Copy link
Member

There are various of comments in the code about this restriction -- if those are wrong-headed we should clean them up.

// Note for untracked locals the flags allowed are "pinned" and "byref"
// and for tracked locals the flags allowed are "this" and "byref"
// Note that these definitions should also match the definitions of
// GC_CALL_INTERIOR and GC_CALL_PINNED in VM/gc.h
//
const unsigned byref_OFFSET_FLAG = 0x1; // the offset is an interior ptr
const unsigned pinned_OFFSET_FLAG = 0x2; // the offset is a pinned ptr
#if !defined(TARGET_X86) || !defined(FEATURE_EH_FUNCLETS)
const unsigned this_OFFSET_FLAG = 0x2; // the offset is "this"
#endif

#ifdef DEBUG
if (mode == MAKE_REG_PTR_MODE_ASSIGN_SLOTS)
{
// Tracked variables can't be pinned, and the encoding takes
// advantage of that by using the same bit for 'pinned' and 'this'
// Since we don't track 'this', we should never see either flag here.
// Check it now before we potentially add some pinned flags.
for (varPtrDsc* varTmp = gcVarPtrList; varTmp != nullptr; varTmp = varTmp->vpdNext)
{
const unsigned flags = varTmp->vpdVarNum & OFFSET_MASK;
assert((flags & pinned_OFFSET_FLAG) == 0);
assert((flags & this_OFFSET_FLAG) == 0);
}
}
#endif // DEBUG

From my understanding there are various issues here:

  • whether the pinned local is tracked/untracked for jit analysis purposes
  • whether the pinned local is tracked/untracked for gc reporting purposes
  • whether the pinned local can live in a register or must be on the stack

Today the two notions of tracked largely coincide, but enabling a subset of JIT optimizations only requires the first. In many/most cases the pinned local ends up not having many appearances, so one might not be giving up much by requiring the pinned local to be on the stack and untracked for gc purposes, in which case the current reporting scheme with the restrictions above is sufficient.

The JIT can't reason about the liveness of the pinned local very well right now. Instead of trying to do this, perhaps it can model these locals as jit-tracked and always live (disabling dead stores and possibly other optimizations that would remove defs, similar in some ways to how we handle EH write thru), and gc-liveness-untracked (that is, forcibly initialized to null in the prolog, and on the stack). This could unblock optimizations based on the current value like bounds check elimination.

Or perhaps the JIT could figure out the true liveness extents and add keep-alives or similar in the proper places and just treat these locals as (mostly) a normal tracked locals. We'd still need some kind of restriction on removing defs (eg copy prop, forward sub) so we don't lose the pin all together since it is tied to the specific local. I think this analysis is hard. We'd need to determine the maximum extent of any non-GC use based off of each pinned def.

cc @dotnet/jit-contrib

@jkotas
Copy link
Member

jkotas commented Feb 29, 2024

There are various of comments in the code about this restriction -- if those are wrong-headed we should clean them up.

#99092

@markples
Copy link
Member

markples commented Mar 4, 2024

The problem with pinned locals is that their lifetime has to be extended till end of method or till they are explicitly cleared (or assigned some other value that is pinned implicitly). The JIT takes a shortcut to achieve this lifetime extension by marking them as untracked.

To elaborate on "assigned some other value", I suspect that "implicitly" means a location that is pinned by definition (e.g., pinned_var = &local), but explicitly is also a (complicated) possibility:

pinned_1 = &location;
pinned_2 = pinned_1; // or &location
pinned_2 = null;
pinned_1 = null;

pinned_2 does not need to be pinned in this case since the pinning of location is covered by pinned_1. It is necessary that the lifetime of pinned_1 extends past that of pinned_2. I think this is the same problem as "RC subsumption" for reference counting GC (see work by Pramod Joisha).

It's also convenient to think about pinned = value as distinctly unpinning the old value and pinning the new value. Then certain operations become clear - for example, if the new value doesn't need to be pinned, then the assignment could be replaced by an assignment of null (if untracked, or a keepalive/end-of-liveness if tracked).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

7 participants