Is your feature request related to a problem or challenge?
Currently, DataFusion handles conjunctive join conditions well. For example, conditions like:
can already be optimized by extracting equijoin keys while preserving the remaining predicates as join filters.
However, for queries such as:
SELECT *
FROM t1
INNER JOIN t2
ON t1.id = t2.id OR t1.age = t2.age
the presence of an OR condition prevents join key extraction. As a result, the optimizer typically falls back to a single inner join with a complex filter condition, making it difficult for later optimization stages and the execution engine to take advantage of equijoin-based optimizations. Therefore, this type of join will be executed using nested loop join.
This proposal introduces a new logical optimizer rule that rewrites:
t1 INNER JOIN t2 ON cond1 OR cond2 OR ... OR condN
into:
SELECT * FROM (
(t1 INNER JOIN t2 ON cond1)
UNION ALL
(t1 INNER JOIN t2 ON cond2 AND NOT cond1)
UNION ALL
(t1 INNER JOIN t2 ON cond3 AND NOT cond1 AND NOT cond2)
...
)
Each branch becomes mutually exclusive, preserving the original semantics while exposing simpler join conditions that can be optimized independently.
Initial Scope
The first implementation only targets the following cases:
- Only INNER JOIN
- Only when the join condition contains a top-level splittable OR
- Logical plan rewrite only; no execution-layer changes
- No support for outer joins, semi/anti joins, or full joins
- No join reordering across multi-level join trees; only rewrite the current join node
Rewrite Example
Input:
SELECT *
FROM t1
INNER JOIN t2
ON t1.id = t2.id OR t1.age = t2.age
Rewritten as:
SELECT *
FROM (
SELECT * FROM t1 INNER JOIN t2 ON t1.id = t2.id
UNION ALL
SELECT * FROM t1 INNER JOIN t2
ON t1.age = t2.age AND NOT (t1.id = t2.id)
)
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
Is your feature request related to a problem or challenge?
Currently, DataFusion handles conjunctive join conditions well. For example, conditions like:
can already be optimized by extracting equijoin keys while preserving the remaining predicates as join filters.
However, for queries such as:
the presence of an OR condition prevents join key extraction. As a result, the optimizer typically falls back to a single inner join with a complex filter condition, making it difficult for later optimization stages and the execution engine to take advantage of equijoin-based optimizations. Therefore, this type of join will be executed using nested loop join.
This proposal introduces a new logical optimizer rule that rewrites:
into:
Each branch becomes mutually exclusive, preserving the original semantics while exposing simpler join conditions that can be optimized independently.
Initial Scope
The first implementation only targets the following cases:
Rewrite Example
Input:
Rewritten as:
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response