JIT: Transform 'CONST - x' to 'xor x, CONST' when possible#126529
JIT: Transform 'CONST - x' to 'xor x, CONST' when possible#126529BoyBaykiller wants to merge 5 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
…_no_tiered_compilation regression
|
There is one type of regression where |
|
As a generalization, the rule is that you can implement subtraction as xor any time you can be sure there won't be a borrow, because binary subtraction is defined as an xor followed by borrow. This is always true when the left operand is a bit pattern of all ones and the right operand is less than or equal to the left. So naturally, it works for subtraction of any integral type if left has all bits set. It also happens to work when left has all but the MSB set, because in that case, the only borrow could come from beyond the MSB. This is why the optimization works for This also means, for example, that with a byte subtraction, The reason I mention this is that if you have a range assertion that can prove right is smaller for any reason, it can apply there too. We have examples where this optimization is done manually in the libraries code: |
|
Thanks for the explanation! I was just about to look for the more general case to use in #126553. Unfortunately I am not sure there is a standard way to get general range information (or LE assertion) in morph. And it's only run if As for handling smaller types: I skipped that because I still need to see what's the proper & easy way to see through this IR that we create for small types and get the actual type as defined in the C# source: CAST int <- ubyte <- int
* LCL_VAR int V01 arg1 (last use) |
In IL, you'll always have a cast back to the small type after an arithmetic operation, because IL doesn't define those operations for anything smaller than int32. Within JIT, small types may be flagged as normalize on load or normalize on store, so JIT will force zero/sign extension as appropriate there.
I'm not terribly familiar with that part of the code, but you can fetch assertions from morph. Here's an example that might lead you to the right parts. runtime/src/coreclr/jit/morph.cpp Lines 3226 to 3239 in 3f14c31 And you can see some example assertions that could be interesting for this optimization if only they were more specific: runtime/src/coreclr/jit/assertionprop.cpp Lines 306 to 314 in 3f14c31 I'm not following exactly why the comment says there's no advantage to a more specific range, though. |
* use 'IntegralRange::ForNode' for somewhat better range info (could be improved still) * move to fgOptimizeAddition
|
I've generalized the transformation to catch all cases, given In theory the case you showed would be naturally handled now if range for Also in order to handle this byte subtract case you mentioned: byte Byte127(byte x)
{
return (byte)(127 - x);
}I need to detect the outer byte cast so I know the range is in [0, 255] which I don't know how to properly do in |
|
@EgorBo PTAL If this gets merged we can justify to add general I think it would also be nice to revisit making |
There was a problem hiding this comment.
Pull request overview
This PR improves CoreCLR JIT morph optimizations by recognizing when CONST - x (canonicalized as ADD(NEG(x), CONST)) can be safely rewritten as XOR(x, CONST) using IntegralRange::ForNode range information, and it enables additional VN-based folding in assertion propagation.
Changes:
- Add an
ADD(NEG(x), CONST) => XOR(x, CONST)transform infgOptimizeAdditionguarded by anIntegralRange-based validity check. - Update
optVNBasedFoldCurStmtto includeGT_NOTamong the node kinds eligible for VN-based folding.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/coreclr/jit/morph.cpp |
Adds a range-checked transform to rewrite certain constant subtractions into XOR for better codegen. |
src/coreclr/jit/assertionprop.cpp |
Enables VN-based folding for GT_NOT nodes by adding it to the foldable opcode set. |
| uint64_t mask = UINT64_MAX >> BitOperations::LeadingZeroCount(lo ^ hi); | ||
| mask = lo | mask; |
There was a problem hiding this comment.
mask computation can invoke undefined behavior when lo == hi (i.e., lo ^ hi == 0): BitOperations::LeadingZeroCount(0) returns 64, so UINT64_MAX >> 64 is an invalid shift in C++. This can miscompile the JIT itself and may incorrectly allow/deny the transform for singleton ranges. Handle the lo == hi case explicitly (e.g., treat the range mask as just lo, or otherwise ensure the shift count is < 64).
| uint64_t mask = UINT64_MAX >> BitOperations::LeadingZeroCount(lo ^ hi); | |
| mask = lo | mask; | |
| uint64_t diff = lo ^ hi; | |
| uint64_t mask = lo; | |
| if (diff != 0) | |
| { | |
| mask |= UINT64_MAX >> BitOperations::LeadingZeroCount(diff); | |
| } |
Given any
CONST - xand range info forxobtained fromIntegralRange::ForNodesee if we can transform it tox ^ CONST. These changes motivate improvements inIntegralRange::ForNodeandRangeOps(#126553). It also made me find a missingGT_NOTin asserprop to enable VN based folding.Found by looking at diffs from #126442
Example: