Skip to content

JIT: Transform 'CONST - x' to 'xor x, CONST' when possible#126529

Open
BoyBaykiller wants to merge 5 commits intodotnet:mainfrom
BoyBaykiller:transform-const-x-to-xor
Open

JIT: Transform 'CONST - x' to 'xor x, CONST' when possible#126529
BoyBaykiller wants to merge 5 commits intodotnet:mainfrom
BoyBaykiller:transform-const-x-to-xor

Conversation

@BoyBaykiller
Copy link
Copy Markdown
Contributor

@BoyBaykiller BoyBaykiller commented Apr 4, 2026

Given any CONST - x and range info for x obtained from IntegralRange::ForNode see if we can transform it to x ^ CONST. These changes motivate improvements in IntegralRange::ForNode and RangeOps (#126553). It also made me find a missing GT_NOT in asserprop to enable VN based folding.

Found by looking at diffs from #126442

Example:

int Byte255(byte x)
{
    return 255 - x;
}
;; ------ BASE
G_M64250_IG02:  ;; offset=0x0000
       movzx    rax, dl
       neg      eax
       add      eax, 255
						;; size=10 bbWeight=1 PerfScore 0.75
G_M64250_IG03:  ;; offset=0x000A
       ret      
						;; size=1 bbWeight=1 PerfScore 1.00

;; ------ DIFF
G_M64250_IG02:  ;; offset=0x0000
       movzx    rax, dl
       xor      eax, 255
						;; size=8 bbWeight=1 PerfScore 0.50
G_M64250_IG03:  ;; offset=0x0008
       ret      
						;; size=1 bbWeight=1 PerfScore 1.00
int IntMin1(int x)
{
    return -1 - x;
}
;; ------ BASE
G_M50835_IG02:  ;; offset=0x0000
       mov      eax, edx
       neg      eax
       dec      eax
						;; size=6 bbWeight=1 PerfScore 0.75
G_M50835_IG03:  ;; offset=0x0006
       ret      
						;; size=1 bbWeight=1 PerfScore 1.00
;; ------ DIFF
G_M50835_IG02:  ;; offset=0x0000
       mov      eax, edx
       not      eax
						;; size=4 bbWeight=1 PerfScore 0.50
G_M50835_IG03:  ;; offset=0x0004
       ret      
						;; size=1 bbWeight=1 PerfScore 1.00

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 4, 2026
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Apr 4, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@BoyBaykiller BoyBaykiller changed the title JIT: Transform 'CONST - x' -> 'xor x, CONST' when possible JIT: Transform 'CONST - x' to 'xor x, CONST' when possible Apr 4, 2026
@BoyBaykiller BoyBaykiller marked this pull request as ready for review April 4, 2026 10:27
@BoyBaykiller
Copy link
Copy Markdown
Contributor Author

There is one type of regression where GetRangeFromAssertions doesn't handle XOR and we fail to remove useless JTRUE. Will fix that in a different PR.

@saucecontrol
Copy link
Copy Markdown
Member

saucecontrol commented Apr 5, 2026

As a generalization, the rule is that you can implement subtraction as xor any time you can be sure there won't be a borrow, because binary subtraction is defined as an xor followed by borrow.

This is always true when the left operand is a bit pattern of all ones and the right operand is less than or equal to the left. So naturally, it works for subtraction of any integral type if left has all bits set.

It also happens to work when left has all but the MSB set, because in that case, the only borrow could come from beyond the MSB. This is why the optimization works for int.MaxValue even if subtraction is of type uint (meaning right could be larger).

This also means, for example, that with a byte subtraction, 127 - x can always be implemented as xor, provided you then zero-extend the byte, because any borrow will propagate to all the high bits of the register.

The reason I mention this is that if you have a range assertion that can prove right is smaller for any reason, it can apply there too. We have examples where this optimization is done manually in the libraries code:

// LZCNT returns index starting from MSB, whereas BSR gives the index from LSB.
// 31 ^ BSR here is equivalent to 31 - BSR since the BSR result is always between 0 and 31.
// This saves an instruction, as subtraction from constant requires either MOV/SUB or NEG/ADD.
return 31 ^ (int)X86Base.BitScanReverse(value);

@BoyBaykiller
Copy link
Copy Markdown
Contributor Author

BoyBaykiller commented Apr 5, 2026

Thanks for the explanation! I was just about to look for the more general case to use in #126553.
Clang seems to catch it: https://godbolt.org/z/e8PjefYKx

Unfortunately I am not sure there is a standard way to get general range information (or LE assertion) in morph. And it's only run if fgGlobalMorph...

As for handling smaller types: I skipped that because I still need to see what's the proper & easy way to see through this IR that we create for small types and get the actual type as defined in the C# source:

CAST      int <- ubyte <- int
*  LCL_VAR   int    V01 arg1          (last use)

@saucecontrol
Copy link
Copy Markdown
Member

saucecontrol commented Apr 6, 2026

what's the proper & easy way to see through this IR that we create for small types

In IL, you'll always have a cast back to the small type after an arithmetic operation, because IL doesn't define those operations for anything smaller than int32. Within JIT, small types may be flagged as normalize on load or normalize on store, so JIT will force zero/sign extension as appropriate there.

Unfortunately I am not sure there is a standard way to get general range information (or LE assertion) in morph.

I'm not terribly familiar with that part of the code, but you can fetch assertions from morph. Here's an example that might lead you to the right parts.

// Small-typed arguments and aliased locals are normalized on load. Other small-typed locals are
// normalized on store. If this is one of the former, insert a narrowing cast on the load.
// ie. Convert: var-short --> cast-short(var-int)
//
if (fgGlobalMorph && lclNode->OperIs(GT_LCL_VAR) && varDsc->lvNormalizeOnLoad() &&
/* TODO-ASG: delete this zero-diff quirk */ lclNode->CanCSE())
{
var_types lclVarType = varDsc->TypeGet();
// Assertion prop can tell us to omit adding a cast here. This is useful when the local is a small-typed
// parameter that is passed in a register: in that case, the ABI specifies that the upper bits might be
// invalid, but the assertion guarantees us that we have normalized when we wrote it.
if (optLocalAssertionProp &&
optAssertionIsSubrange(lclNode, IntegralRange::ForType(lclVarType), apLocal) != NO_ASSERTION_INDEX)

And you can see some example assertions that could be interesting for this optimization if only they were more specific:

case NI_AVX2_LeadingZeroCount:
case NI_AVX2_TrailingZeroCount:
case NI_AVX2_X64_LeadingZeroCount:
case NI_AVX2_X64_TrailingZeroCount:
case NI_X86Base_PopCount:
case NI_X86Base_X64_PopCount:
// Note: No advantage in using a precise range for IntegralRange.
// Example: IntCns = 42 gives [0..127] with a non -precise range, [42,42] with a precise range.
return {SymbolicIntegerValue::Zero, SymbolicIntegerValue::ByteMax};

I'm not following exactly why the comment says there's no advantage to a more specific range, though.

* use 'IntegralRange::ForNode' for somewhat better range info (could be improved still)
* move to fgOptimizeAddition
@BoyBaykiller
Copy link
Copy Markdown
Contributor Author

I've generalized the transformation to catch all cases, given CNS - x and range info for x.
I've switched to IntegralRange::ForNode to attain this range info, but as you said its somewhat coarse. Apparently there was a TODO once to make it more precise: #106982. I think this should be revisited.

In theory the case you showed would be naturally handled now if range for BitScanReverse was proper.

Also in order to handle this byte subtract case you mentioned:

byte Byte127(byte x)
{
    return (byte)(127 - x);
}

I need to detect the outer byte cast so I know the range is in [0, 255] which I don't know how to properly do in fgOptimizeAddition. However 255 - x works.

@BoyBaykiller
Copy link
Copy Markdown
Contributor Author

@EgorBo PTAL

If this gets merged we can justify to add general RangeOps:Xor to rangecheck #126553 as this fixes some regressions from this PR.

I think it would also be nice to revisit making IntegralRange able to handle arbitrary ranges instead of just "symbolic constants". With a precise range it would automatically use XOR here and the code could be simplified to simple subtract:

// LZCNT returns index starting from MSB, whereas BSR gives the index from LSB.
// 31 ^ BSR here is equivalent to 31 - BSR since the BSR result is always between 0 and 31.
// This saves an instruction, as subtraction from constant requires either MOV/SUB or NEG/ADD.
return 31 ^ (int)X86Base.BitScanReverse(value);

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves CoreCLR JIT morph optimizations by recognizing when CONST - x (canonicalized as ADD(NEG(x), CONST)) can be safely rewritten as XOR(x, CONST) using IntegralRange::ForNode range information, and it enables additional VN-based folding in assertion propagation.

Changes:

  • Add an ADD(NEG(x), CONST) => XOR(x, CONST) transform in fgOptimizeAddition guarded by an IntegralRange-based validity check.
  • Update optVNBasedFoldCurStmt to include GT_NOT among the node kinds eligible for VN-based folding.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/coreclr/jit/morph.cpp Adds a range-checked transform to rewrite certain constant subtractions into XOR for better codegen.
src/coreclr/jit/assertionprop.cpp Enables VN-based folding for GT_NOT nodes by adding it to the foldable opcode set.

Comment on lines +10438 to +10439
uint64_t mask = UINT64_MAX >> BitOperations::LeadingZeroCount(lo ^ hi);
mask = lo | mask;
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mask computation can invoke undefined behavior when lo == hi (i.e., lo ^ hi == 0): BitOperations::LeadingZeroCount(0) returns 64, so UINT64_MAX >> 64 is an invalid shift in C++. This can miscompile the JIT itself and may incorrectly allow/deny the transform for singleton ranges. Handle the lo == hi case explicitly (e.g., treat the range mask as just lo, or otherwise ensure the shift count is < 64).

Suggested change
uint64_t mask = UINT64_MAX >> BitOperations::LeadingZeroCount(lo ^ hi);
mask = lo | mask;
uint64_t diff = lo ^ hi;
uint64_t mask = lo;
if (diff != 0)
{
mask |= UINT64_MAX >> BitOperations::LeadingZeroCount(diff);
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants