[feature](function) Add ST_NumGeometries, ST_NumPoints, ST_Geometries functions#63049
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
0a9a3ff to
e870bd2
Compare
|
run buildall |
|
/review |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
facf26f to
0d5d800
Compare
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
0fdb594 to
8feb287
Compare
|
run buildall |
1 similar comment
|
run buildall |
TPC-H: Total hot run time: 30924 ms |
TPC-DS: Total hot run time: 170352 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
303b19f to
fb632cd
Compare
|
run buildall |
fb632cd to
4eef041
Compare
|
run buildall |
4eef041 to
fb632cd
Compare
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
TPC-H: Total hot run time: 31061 ms |
TPC-DS: Total hot run time: 169633 ms |
|
run cloud_p0 |
FE UT Coverage ReportIncrement line coverage |
|
run external |
|
run nonConcurrent |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 31738 ms |
TPC-DS: Total hot run time: 173015 ms |
…T_Geometries GIS functions
2489a34 to
b1e980e
Compare
|
run buildall |
TPC-H: Total hot run time: 31483 ms |
TPC-DS: Total hot run time: 173026 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
/review |
There was a problem hiding this comment.
Automated review for head b1e980eb3dd3b7c542542852d8cbd051bf878664.
No additional blocking issues found beyond the already-known review context. I did not submit duplicate inline comments for the existing threads.
Critical checkpoint conclusions:
- Goal/test coverage: The PR adds
ST_NumGeometries,ST_NumPoints/ST_NPoints, andST_Geometries; BE/FE registrations and regression/FE/BE tests cover the main point, linestring, polygon, multipolygon, circle, NULL, invalid input, alias, table, predicate, and arithmetic paths. - Scope/focus: The current head is focused on the geo component functions and associated type/count helpers.
- Concurrency/lifecycle: The changed function execution paths are per-block scalar evaluation and do not introduce shared mutable state, new threads, locks, or special lifecycle management.
- Configuration/compatibility: No new config items, storage formats, edit logs, or FE-BE protocol fields are introduced.
- Parallel code paths: BE function registration and Nereids builtin registration both include the new functions;
ST_NPointsis now registered as an alias rather than a separate implementation. - Error/null handling: Invalid encoded geometries and unsupported point counts return NULL; existing default nullable handling is used for NULL input.
- Performance: Current implementation uses
ColumnViewfor geo string access and avoids the earlier materialization concern raised in existing threads. - Observability/transactions/data writes: Not applicable; no transaction, persistence, or data-write path changes.
- User focus:
review_focus.txthas no additional user-provided focus points.
Testing was not run by this review job; conclusions are based on static review of the PR diff and included tests.
|
Hi @linrrzqqq , thanks for the review. I have addressed the previous review comments in the latest head |
|
PR approved by at least one committer and no changes requested. |
… functions (#63049) … functions ### What problem does this PR solve? Issue Number: ref #48203 Related PR: apache/doris-website#3623 Problem Summary: Add three new spatial functions for geometry collection operations: - `ST_NumGeometries`: Returns the number of sub-geometries in a geometry object. - `ST_NumPoints`: Returns the total number of vertices (points) in a geometry object. - `ST_Geometries`: Decomposes a geometry object into an array of its sub-geometries.
### What problem does this PR solve? Issue Number: close #N/A Problem Summary: BE-UT has been failing master-wide since both PR apache#63491 (which strongly typed `ColumnNullable::get_null_map_column[_ptr]()` to `ColumnUInt8`) and PR apache#63049 (which added `functions_geo_test.cpp`) landed today. The new test calls ```cpp assert_cast<ColumnUInt8*>(nullable_input->get_null_map_column_ptr().get()) ->insert_value(0); ``` but the inner expression is already `ColumnUInt8*`, so the cast triggers the same-type static_assert added by PR apache#63133 to `src/core/assert_cast.h`: ``` static assertion failed due to requirement '!std::is_same_v<doris::ColumnVector<doris::TYPE_BOOLEAN> *, doris::ColumnVector<doris::TYPE_BOOLEAN> *>': assert_cast is redundant for the same type after removing cv/ref qualifiers ``` That kills `doris_be_test` compilation on every PR that runs BE-UT. Use the strongly typed `get_null_map_column()` (which returns `ColumnUInt8&`) directly so the cast is no longer needed. ### Release note None (test-only change, restores BE-UT compilation on master). ### Check List (For Author) - Test: - Compile-check on local ASAN tree: the affected translation unit now builds clean (`ninja test/CMakeFiles/doris_be_test.dir/exprs/function/geo/functions_geo_test.cpp.o`). - Behavior changed: No - Does this need documentation: No
### What problem does this PR solve? #63491 #63049 ``` ../src/core/assert_cast.h:54:19: error: static assertion failed due to requirement '!std::is_same_v<doris::ColumnVector<doris::TYPE_BOOLEAN> *, doris::ColumnVector<doris::TYPE_BOOLEAN> *>': assert_cast is redundant for the same type after removing cv/ref qualifiers 54 | static_assert(!std::is_same_v<AssertCastNormalizedType_t<To>, AssertCastNormalizedType_t<From>>, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../test/exprs/function/geo/functions_geo_test.cpp:375:5: note: in instantiation of function template specialization 'assert_cast<doris::ColumnVector<doris::TYPE_BOOLEAN> *, TypeCheckOnRelease::ENABLE, doris::ColumnVector<doris::TYPE_BOOLEAN> *>' requested here 375 | assert_cast<ColumnUInt8*>(nullable_input->get_null_map_column_ptr().get())->insert_value(0); | ^ 1 error generated. ```
The same-type assert_cast static_assert added by apache#63059 rejects the ColumnUInt8* -> ColumnUInt8* cast at functions_geo_test.cpp:375, breaking BE UT compilation since the geo test was added (apache#63049). Unblocks BE UT on this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ine_similarity (#62840) ### What problem does this PR solve? Two defensive hardening fixes in CosineDistance::distance and CosineSimilarity::distance to guarantee correct results across the full range of valid float inputs. Fix 1: Use double-precision intermediate when computing the norm Before: return 1 - dot_prod / sqrt(squared_x * squared_y); After: const double norm = std::sqrt(static_cast<double>(squared_x) * static_cast<double>(squared_y)); Why: squared_x * squared_y is a float multiplication. When squared_x and squared_y are both large (e.g. input elements around 1e19), the product exceeds FLT_MAX (~3.4e38) and overflows to +inf. Then sqrt(+inf) = +inf and dot_prod / +inf = 0, so two parallel vectors silently get cosine_distance = 1.0 (should be 0.0) — a wrong result with no warning. For typical L2-normalized embedding vectors this never triggers. But cosine_distance accepts arbitrary float* arrays, not just normalized embeddings, so the function should be safe for any finite float input. The cost is two static_cast<double> ops; double's range (~1.8e308) cannot overflow on any finite float input. For non-overflow inputs the result is bit-for-bit equivalent (verified by existing tests, which match the same static_cast<float>(34.0 / std::sqrt(14.0 * 83.0)) formula). Fix 2: Clamp cosine to [-1, 1] After: return std::clamp(static_cast<float>(dot_prod / norm), -1.0f, 1.0f); Why: Float rounding can make the computed cosine slightly exceed 1.0 for identical (or near-identical) vectors. For example with x = y = (0.1f, 0.2f, 0.3f), accumulation rounding can yield cosine = 1.0000001, then 1.0f - cosine = -1e-7 — a negative cosine_distance that violates the metric contract d >= 0 and may break downstream code (DCHECK(distance >= 0), threshold filters, distance aggregation). std::clamp is a one-op guarantee that costs nothing for in-range values. BE UT 编译失败修复(独立于本 PR 余弦修复) 问题:merge 最新 master 后,BE UT 在 functions_geo_test.cpp:375 编译失败: error: static assertion failed: assert_cast is redundant for the same type '!std::is_same_v<ColumnVector<TYPE_BOOLEAN>*, ColumnVector<TYPE_BOOLEAN>*>' 根因:master 上三个 PR 叠加副作用: #63491 把 _null_map 改为强类型 ColumnUInt8::WrappedPtr #63059 加 static_assert 拒绝 same-type assert_cast #63049(5/26 刚合入)新写的 geo 测试还按老接口加了冗余 cast 修复:去掉冗余包装 - assert_cast<ColumnUInt8*>(nullable_input->get_null_map_column_ptr().get())->insert_value(0); + nullable_input->get_null_map_column_ptr()->insert_value(0); get_null_map_column_ptr() 现在直接返回 ColumnUInt8::MutablePtr,->insert_value() 语义不变。 影响:一行改动,仅修复编译报错,不涉及测试语义和余弦相关代码。 ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into --> --------- Co-authored-by: yaoxiao <yaoxiao@fosun.com>
… functions
What problem does this PR solve?
Issue Number: ref #48203
Related PR: apache/doris-website#3623
Problem Summary:
Add three new spatial functions for geometry collection operations:
ST_NumGeometries: Returns the number of sub-geometries in a geometry object.ST_NumPoints: Returns the total number of vertices (points) in a geometry object.ST_Geometries: Decomposes a geometry object into an array of its sub-geometries.Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)