refactor!: unify ResultSet implementations on Arrow-backed path#175
refactor!: unify ResultSet implementations on Arrow-backed path#175mkaufmann wants to merge 18 commits into
Conversation
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
9ecba8a to
08d62bc
Compare
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
b648940 to
07125b1
Compare
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (74.38%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #175 +/- ##
============================================
- Coverage 82.37% 80.49% -1.88%
+ Complexity 1871 1707 -164
============================================
Files 125 123 -2
Lines 5020 4953 -67
Branches 540 521 -19
============================================
- Hits 4135 3987 -148
- Misses 642 737 +95
+ Partials 243 229 -14
🚀 New features to boost your workflow:
|
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.
07125b1 to
e329860
Compare
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.
e329860 to
d17f1b0
Compare
|
Per the two review threads, split out the cherry-pickable fixes as their own PRs against
This PR (#175) keeps the same fixes as the first two commits — when #185 / #186 land, those commits will collapse to no-ops at rebase time. For the remaining "should QueryResultArrowStream allocator-ownership move pre-unify too?" thread (#175 review): waiting on your call before I do that split. As I noted there, it's a non-trivial surgery on the unify commit and I'd rather get your sign-off before rewriting ~800 lines of refactor. |
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.
d17f1b0 to
f4cad29
Compare
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.
67ddd24 to
b97abe2
Compare
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.
84391cd to
d90bb9a
Compare
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.
d90bb9a to
b83e8b7
Compare
mkaufmann
left a comment
There was a problem hiding this comment.
Review summary
12 findings: 1 Blocking, 8 Suggested, 3 Optional. The PR collapses the metadata path onto the streaming Arrow pipeline cleanly, and the close-order story for the new Result holder is right where it lands. The findings cluster around three themes:
-
API surface. The diff promotes
DataCloudResultSetfrom an interface to a concrete class. That's a binary break for anyone implementing or proxying the type from the shaded JAR — flagged atDataCloudResultSet.java:49(Blocking).StreamingResultSet,DataCloudMetadataResultSet,SimpleResultSet, andColumnAccessorall leave the public surface without a deprecation window. With release-please configured for Conventional Commits, the squash-merge subjectrefactor:will not surface a CHANGELOG entry — finding S7 covers this. -
JDBC-spec compliance regressions on metadata.
MetadataSchemas.TYPE_INFOdeclaresCASE_SENSITIVE/UNSIGNED_ATTRIBUTE/FIXED_PREC_SCALE/AUTO_INCREMENTastext(...)columns whileHyperTypes.typeInfoRows()populates them withBooleanvalues; the newVarCharVectorSettertoString()s them andBaseIntVectorAccessor.getBooleanis not overridden, sogetBoolean(...)on these columns now throws where the oldSimpleResultSetreturned the rawBoolean(S1, S2). JDBC 4.2 Table B-6 listsINTEGER → booleanandVARCHAR → boolean(for "true"/"false"/"1"/"0") as recommended conversions; tools that readNULLABLEfromgetColumns()or boolean flags fromgetTypeInfo()will break. -
Allocator-hygiene gaps that survive the cleanup.
cursor.close()andDataCloudResultSet.of's failure path use plaintry/finallywithoutaddSuppressed, so if the leak detector trips onallocator.close()it masks the original exception (S3).DataCloudStatement.executeQuery/getResultSetcallsgetQueryId()between allocator construction and ownership transfer toDataCloudResultSet.of— if that gRPC call throws, the allocator escapes (S4).
On the PR body itself: The body describes a design that isn't in the diff. The "source-agnostic cursor" / BatchLoader / AutoCloseable ownership and the StreamingResultSet.ofInMemory(root, owned, queryId, zone, cols) factory don't exist in the cursor or in DataCloudResultSet. The metadata path actually round-trips through ArrowStreamWriter → IPC bytes → ArrowStreamReader (verified at MetadataResultSets.java:88), and the typeName-preservation mechanism rides on Arrow field metadata under a datacloud-jdbc:type_name key (HyperTypeToArrow.toField / ArrowToHyperTypeMapper.toColumnMetadata), not on a cols parameter. The body also references a non-existent MetadataArrowBuilder class (the factory is MetadataResultSets), names StreamingResultSet as the unified type when in fact StreamingResultSet.java is deleted and DataCloudResultSet.java is the new home, and the "Built on top of #moritz/centralize-types-via-hypertype" line is malformed (won't render as a PR ref) and stale (the dependency landed as #174). The first-line "I'm not happy with the PR and the approach" disclaimer would land in git log permanently if squashed as-is.
Range-check tightening (IntVectorSetter / SmallIntVectorSetter / TinyIntVectorSetter now throw IllegalArgumentException on out-of-range Numbers — verified at VectorPopulator.java:240,261,433) is a real behavior change that the PR body's "Behavior changes worth calling out" section omits. Worth a bullet so future debugging of IllegalArgumentException from prepared-statement parameter binding has a paper trail.
Tests. Behavior change "metadata int columns no longer accept getBoolean" is no longer pinned by a test — the old assertThat(getBoolean("NULLABLE")).isFalse() was replaced with getInt("NULLABLE"), and the test comment ("long→boolean coerces to false") is misleading because the code does not actually coerce — BaseIntVectorAccessor inherits getBoolean from QueryJDBCAccessor, which throws. A direct assertThrows(SQLException.class, () -> getBoolean("ORDINAL_POSITION")) would pin it.
Inline findings follow.
Generated by the review-pr-tavern skill — a human did not write this comment.
| assertThat(columnResultSet.getInt("DATA_TYPE")).isEqualTo(12); | ||
| assertThat(columnResultSet.getBoolean("NULLABLE")).isFalse(); | ||
| // NULLABLE is an INTEGER column. Arrow-backed getInt reports the nullability enum; | ||
| // 0 (columnNoNulls) for NOT NULL rows, which coerces to false via long→boolean. |
There was a problem hiding this comment.
Suggested — The test comment says "0 (columnNoNulls) for NOT NULL rows, which coerces to false via long→boolean", and the previous getBoolean("NULLABLE") assertion was replaced with getInt("NULLABLE"). But BaseIntVectorAccessor does not override getBoolean() — the call falls through to QueryJDBCAccessor.getBoolean() (line 52-54) which throws SQLFeatureNotSupportedException. The comment describes a coercion that does not happen in this codebase, and the behavior-change claim from the PR body ("getBoolean ... on an integer column throws SQLException") is no longer pinned by any direct assertion. Add:
assertThrows(SQLException.class, () -> columnResultSet.getBoolean("ORDINAL_POSITION"));next to the existing date/time assertThrows block — it pins the new contract and removes the misleading comment in one go. (See the separate finding on MetadataSchemas for the boolean-coercion gap that arguably should still work for spec compliance.)
Generated by the review-pr-tavern skill — a human did not write this comment.
| * @param queryId The query identifier (may be {@code null} for synthesized result sets). | ||
| * @param sessionZone The session timezone used for timestamp conversions. | ||
| */ | ||
| public static DataCloudResultSet of(QueryResultArrowStream.Result arrowStream, String queryId, ZoneId sessionZone) |
There was a problem hiding this comment.
Suggested — Three deletions worth a CHANGELOG entry: StreamingResultSet, DataCloudMetadataResultSet, SimpleResultSet, ColumnAccessor were all public on main. Even if no in-tree code outside :jdbc-core referenced them, they were reachable from the shaded fat JAR. StreamingResultSet.of(ArrowStreamReader, queryId) and of(ArrowStreamReader, queryId, ZoneId) were the only public entry points for wrapping a foreign Arrow stream into a JDBC ResultSet. The new DataCloudResultSet.of(QueryResultArrowStream.Result, ...) requires constructing an internal Result holder, which is a tighter coupling than the old factories.
If there's no migration alias, please at least call this out in a BREAKING CHANGE: footer so release-please picks it up (the refactor: subject otherwise produces no CHANGELOG entry — see the release-notes finding). A thin public static DataCloudResultSet of(ArrowStreamReader reader, BufferAllocator allocator, String queryId, ZoneId zone) overload that internally builds a Result would also let external Arrow-stream wrappers continue to compile.
Release-notes coverage. Release-notes coverage. The repo uses release-please with release-type: simple and Conventional Commits. A squash-merge subject of refactor: unify ResultSet implementations on Arrow-backed path produces no CHANGELOG entry: refactor isn't on the release-please default sections, and at 0.x.y a feat: would be needed for a minor bump otherwise. Combined with the API surface deltas in this PR (interface→class on DataCloudResultSet, four removed public types, stricter accessor coercion on metadata int columns), the change will silently land in the next release and users discovering the break via CHANGELOG.md will have nothing to read.
Use refactor!: for the squash subject and add a BREAKING CHANGE: footer enumerating the specific deletions and behavior changes. release-please reads the ! and the footer and produces a Breaking Changes section.
Generated by the review-pr-tavern skill — a human did not write this comment.
|
|
||
| @Override | ||
| public void close() throws SQLException { | ||
| if (!closed) { |
There was a problem hiding this comment.
Optional — close() reads if (!closed) { cursor.close(); closed = true; }. If cursor.close() throws — e.g. the allocator's leak detector fires IllegalStateException — closed stays false. A defensive caller that catches and retries close() (try-with-resources won't, but JDBC connection pools and driver wrappers sometimes do) re-enters cursor.close(), which calls allocator.close() on an already-closed RootAllocator. Arrow throws on the second close.
Flip the order — set closed = true before delegating, or wrap in try/finally that sets it unconditionally. The caller still gets the cleanup exception; subsequent calls become no-ops, matching the standard JDK AutoCloseable pattern.
Generated by the review-pr-tavern skill — a human did not write this comment.
| @Override | ||
| protected void setValueInternal(DecimalVector vector, BigDecimal value) { | ||
| vector.setSafe(0, value.unscaledValue().longValue()); | ||
| protected void setValueInternal(DecimalVector vector, int index, BigDecimal value) { |
There was a problem hiding this comment.
Optional — DecimalVectorSetter.setValueInternal does vector.setSafe(index, value.unscaledValue().longValue()). A BigDecimal whose unscaled exceeds Long.MAX_VALUE silently truncates. The PR added IllegalArgumentException range checks on IntVectorSetter (line 240), SmallIntVectorSetter (line 261), and TinyIntVectorSetter (line 433); decimal got the signature update but not the guard. The asymmetry is jarring now that this is the single populator path for both binding and metadata.
Not required (today's call sites probably stay in range), but worth a bitLength() > 63 → IllegalArgumentException check or a one-line javadoc explaining why decimals are exempt.
Generated by the review-pr-tavern skill — a human did not write this comment.
| @SneakyThrows | ||
| void closesTheReader() { | ||
| val sut = new ArrowStreamReaderCursor(reader, ZoneId.systemDefault()); | ||
| void closesReaderAndAllocator() { |
There was a problem hiding this comment.
Optional — The new closesReaderAndAllocator test verifies both reader.close() and allocator.close() ran, but pins neither the order nor the throw-during-close contract. The cursor's javadoc explicitly says reader-first ordering exists "because reversing trips a leak detector", and the body uses try/finally so allocator.close runs even when reader.close throws. Both are exactly the properties most likely to regress.
A two-line tightening covers it:
InOrder inOrder = Mockito.inOrder(reader, allocator);
inOrder.verify(reader).close();
inOrder.verify(allocator).close();plus a second test where reader.close() is doThrow(...)-ed and the assertion is that allocator.close() still ran.
Generated by the review-pr-tavern skill — a human did not write this comment.
Collapse the two ResultSet families (streaming Arrow + row-based metadata) into a single Arrow-backed implementation so there is one accessor pipeline, one set of type semantics, and one place to fix bugs. Changes: - ArrowStreamReaderCursor becomes source-agnostic: a BatchLoader drives a VectorSchemaRoot, whether sourced from an ArrowStreamReader or a pre-populated in-memory batch. The cursor also owns an AutoCloseable so it is responsible for releasing the allocator + reader on close — the old ArrowStreamReader.close() would only tear down vectors and leak the 100 MB RootAllocator. - QueryResultArrowStream.toArrowStreamReader returns a Result holder that pairs the reader with the allocator and closes both in the right order so Arrow's accounting invariants hold. - StreamingResultSet gains ofInMemory(root, owned, queryId, zone, cols) so metadata results funnel through the same result set. A columns override preserves the JDBC-spec typeName labels (e.g. TEXT) that would otherwise be lost when deriving from the Arrow schema. - MetadataArrowBuilder materialises List<List<Object>> metadata rows into a populated VectorSchemaRoot using the existing HyperTypeToArrow mapping; MetadataResultSets is the factory callers use. - QueryMetadataUtil and DataCloudDatabaseMetadata route getTables, getColumns, getSchemas, getTypeInfo and empty metadata results through the Arrow-backed StreamingResultSet. - DataCloudMetadataResultSet, SimpleResultSet, and ColumnAccessor are removed now that no caller depends on them. - StreamingResultSet.getObject(int, Class) gains an isInstance-based fallback so callers can retrieve String-typed VARCHAR columns without each accessor having to implement typed getObject. - Tests moved to the unified path; integer-accessor-only assertions in DataCloudDatabaseMetadataTest updated to reflect stricter Arrow accessor semantics.
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
StreamingResultSet.of catches IOException and IllegalArgumentException from the Arrow schema decode and rewraps as SQLException. At all four query-path call sites (DataCloudConnection.getRowBasedResultSet, getChunkBasedResultSet, DataCloudStatement.executeQuery, getResultSet) the surrounding try-catch only catches StatusRuntimeException, so a SQLException thrown from of() bypasses it and leaks the 100 MB RootAllocator returned by QueryResultArrowStream.toArrowStreamReader. Introduce StreamingResultSet.ofClosingOnFailure(Result, queryId, sessionZone) that takes the reader+allocator pair and closes both on construction failure (reader first so its buffers release before the allocator's budget check). Switch all four call sites to it. The metadata path in MetadataResultSets.of already had this shape; this fixes the matching gap on the query side. Add a regression test that builds an Arrow IPC stream with an unsupported field type (LargeUtf8) and asserts the helper closes both the reader and the allocator on the resulting SQLException.
The Int/SmallInt/TinyInt setters widened from concrete boxed types (Integer/Short/Byte) to Number so metadata rows could pass long values, but lost the implicit "right boxed type" check at the call sites that went through DataCloudPreparedStatement.setObject for parameter binding. A user binding Long.MAX_VALUE to an INT32 parameter would silently get (int) Long.MAX_VALUE = -1 written to the vector. Add an explicit range check on Int/SmallInt/TinyInt setters before narrowing. Both the metadata path and the parameter-binding path go through these setters, so strict checks here mean strict on both paths. BigInt accepts the full long range and is unchanged. Pin the behavior with a focused unit test (IntegerVectorSetterRangeCheckTest).
The driver round-trips JDBC-spec type-name overrides (e.g. "TEXT" for metadata columns) through Arrow field metadata under a custom key. The previous key "jdbc:type_name" used an unprefixed namespace not reserved by the Arrow spec — Hyper, query-federator, or another Arrow producer could emit a same-named key in a future protocol version, in which case ArrowToHyperTypeMapper would silently override its own derived type name with whatever upstream stamped. Rename to "datacloud-jdbc:type_name" so the namespace is unambiguous, and expand the field's javadoc to document the namespace rationale.
The fallback in ArrowToHyperTypeMapper.toColumnMetadata — when a field
has no datacloud-jdbc:type_name override, ColumnMetadata.typeName is
null and the JDBC layer derives the column type-name from the
HyperType — was load-bearing but unasserted. Real Hyper Arrow streams
never stamp the override, so every functional query test exercised the
fallback implicitly; if a future refactor broke it, the regression
would not surface in the existing suite.
Two new pin tests:
- ArrowToHyperTypeMapperTest at the unit boundary: field with override
-> typeName matches; field without override (null metadata, empty
metadata) -> typeName is null.
- StreamingResultSetTest.getColumnTypeNameFallsBackToDerivedNameOnRealHyperStream
end-to-end against local Hyper: executeQuery on a select with INT,
VARCHAR, DECIMAL columns asserts ResultSetMetaData.getColumnTypeName
returns the derived names ("INTEGER", "VARCHAR", "DECIMAL").
Drive-by pin test: StreamingResultSet.getObject(int, Map<String,Class<?>>) with a null or empty type map should behave like plain getObject(int) per the JDBC spec. Previously not asserted anywhere. The companion getObject(Class) fallback test landed earlier on this branch, bundled into the QueryJDBCAccessor base-class fix commit so the fix and its end-to-end coverage ship as a single cherry-pickable unit.
Previously a row with the wrong number of elements would silently leave the trailing columns as Arrow null (interpreted as missing values). Today every caller routes through MetadataSchemas so the sizes match by construction, but a future caller bug would surface only inside vector population, far from the boundary. Add an explicit arity check at the of(...) entrypoint: each non-null row must have exactly columns.size() elements. Null rows are accepted as the all-nulls row (matching the legacy coerceRows convention of turning null into emptyList). Empty rows are accepted only when the schema is also empty. Pin behavior with MetadataResultSetsTest covering short, long, correct-arity, null-row, and empty-rows cases.
Now that ArrowStreamReaderCursor.loadNextNonEmptyBatch (introduced earlier on this branch as a pre-unify cursor fix) consumes empty batches at the cursor seam, MetadataResultSets.writeArrowStream no longer needs its own "skip writeBatch when rowCount==0" workaround: the cursor handles the empty-only case correctly. Remove the special case and always emit a batch. Tightens the zeroRowOnlyBatchYieldsNoRows test docstring to match.
DataCloudMetadataResultSet was deleted in this PR, but the test file retained the old name and lived in the wrong package. Merge its empty- result-set JDBC-shape smoke tests into the new MetadataResultSetsTest under the .core.metadata package and delete the legacy file. No behavior change.
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.
StreamingResultSet had two public factories — of(reader, allocator, queryId[, zone]) (4 callers) and ofClosingOnFailure(Result, queryId, zone) (5 callers). Every production caller wanted the close-on-failure behavior; only tests and the metadata helper used the bare of(). Two factories with overlapping responsibilities is one too many — a caller hitting the bare of() and not knowing about ofClosingOnFailure would silently leak the 100 MB RootAllocator on construction failure. Collapse to one public factory: - of(QueryResultArrowStream.Result, queryId, sessionZone) — the only factory callers see, always closes both reader and allocator on failure. Name is the unambiguous "of" because there is no other. - create(reader, allocator, queryId, sessionZone) — private; just the construction body the factory wraps. Production call sites (DataCloudConnection, DataCloudStatement) and MetadataResultSets were already passing a (reader, allocator) pair, so the call shape collapses to passing the Result holder. Tests that were building the pair locally now wrap it in a Result the same way.
…r interface Pre-unify there were three result-set implementations: StreamingResultSet (streaming Arrow query results), DataCloudMetadataResultSet (metadata), SimpleResultSet (in-memory rows). The DataCloudResultSet interface — a one-method (getQueryId) extension over java.sql.ResultSet — was the common "implements" the public API surfaced; StreamingResultSet was the only one that ever implemented it as a non-trivial impl. The unify refactor collapsed all three implementations into StreamingResultSet, but kept the interface and the "Streaming" name. Two problems fall out: - The "Streaming" name now lies. Metadata results flow through the same class but they're a one-shot in-memory IPC blob — nothing streaming about them. MetadataResultSets.of even passes /*queryId=*/ null because there is no query. - The DataCloudResultSet interface has one implementer and one method. Layering an interface for one impl is just a reader trap: callers instinctively look for "what other implementations exist" and find none. Collapse the two: - Rename the class StreamingResultSet -> DataCloudResultSet. - Delete the old DataCloudResultSet interface (the public method getQueryId() now lives directly on the class via @Getter). - Update all production and test references; rename the affected test files to match (StreamingResultSet*Test -> DataCloudResultSet*Test). The public API surface is unchanged in source for the common cases: DataCloudConnection.getRowBasedResultSet / getChunkBasedResultSet still return DataCloudResultSet, just as a class instead of an interface. This is binary-incompatible for any caller that ever cast to or implemented the old interface; in practice only StreamingResultSet implemented it on the read side, and no code outside the driver implemented it on the write side.
b83e8b7 to
7baf67d
Compare
|
Rebased on Walked the 10 inline findings against the rebased tree — all still apply. PRs #185 (zero-row batch skip) and #186 ( GitHub now flags the inline comments as outdated because of the SHA rewrite. The findings themselves are unchanged — text and line anchors are still correct. Generated by the |
…torSetter DatabaseMetaData.getTypeInfo declared CASE_SENSITIVE, UNSIGNED_ATTRIBUTE, FIXED_PREC_SCALE, and AUTO_INCREMENT as VARCHAR while the row producer in HyperTypes.buildTypeInfoRow wrote Boolean values into them. The mismatch worked only because VarCharVectorSetter accepted Object and silently called value.toString(), so the four columns surfaced as "true"/"false" strings instead of the boolean payload JDBC 4.2 (DatabaseMetaData.getTypeInfo) and pgjdbc both define for these positions. Declare the four columns with a new bool(...) helper in MetadataSchemas that produces a HyperType.bool(true) / Constants.BOOL ColumnMetadata. The existing BitVectorSetter already accepts Boolean, so the row producer is unchanged. Tighten VarCharVectorSetter from BaseVectorSetter<VarCharVector, Object> to <VarCharVector, String> so non-String payloads fail fast at the BaseVectorSetter type guard instead of being toString-coerced — the byte[] arm was dead (setBytes / setBinaryStream / setUnicodeStream / setAsciiStream all throw FEATURE_NOT_SUPPORTED in DataCloudPreparedStatement). Both fixes land together because tightening the setter without the schema fix would make getTypeInfo throw IllegalArgumentException on the Boolean payload. Behavior change: getObject on the four columns now returns Boolean (per JDBC spec), not String. Callers that previously cast (String) rs.getObject(...) will get a ClassCastException; rs.getBoolean(...) starts working where it previously threw on the VARCHAR path, and rs.getString(...) keeps returning the same lowercase "true"/"false" via BooleanVectorAccessor.getString. Pin the schema with three new MetadataSchemasTest methods mirroring the existing COLUMNS coverage (names / typeNames / JdbcTypeIds), add a strict- type regression test for VarCharVectorSetter modeled on IntegerVectorSetterRangeCheckTest so a future re-widening trips CI, and exercise rs.getBoolean on the four boolean columns end-to-end in DataCloudDatabaseMetadataTest.testGetTypeInfo.
ArrowStreamReaderCursor.close used a plain try/finally that closed reader first, allocator second. When both threw — the most likely failure mode because the allocator's leak detector fires on close when buffers are still outstanding, which is exactly what an exception during reader.close produces — Java's finally semantics replaced the reader's exception with the allocator's. The reader exception is the diagnostically interesting one (the leak detector firing on allocator.close is usually a symptom); silently dropping it left only the symptom in the stack trace. Switch the cursor to try-with-resources over the (allocator, reader) pair so reader closes first and the allocator's exception attaches as suppressed onto the reader's instead of replacing it. Same fix on the construction-failure cleanup in DataCloudResultSet.of: the reader.close was already wrapped with addSuppressed but the immediately-following allocator.close was bare and could replace the original construction SQLException; wrap it the same way. Pin the new behavior with a Mockito-based test that throws from both reader.close and allocator.close and asserts the reader's exception is primary with the allocator's attached as suppressed.
DataCloudStatement.executeQuery and getResultSet both fetched iterator.getQueryStatus().getQueryId() twice — once when constructing arrowStream and once when calling DataCloudResultSet.of with it. The second call sat between arrowStream creation (which puts a 100 MB RootAllocator on the field) and the of(...) call that takes ownership of that allocator. If the second getQueryId() throws — e.g. a future refactor makes getQueryStatus async, or a transient gRPC failure surfaces through the cached proto — the allocator escapes both DataCloudResultSet.of's own try/catch (never entered) and the surrounding catch (StatusRuntimeException) (which doesn't close arrowStream). The PR explicitly claims every code path closes its allocator; this hoist closes the window without changing any observable behavior.
…fy cap Both Result-holder construction sites — QueryResultArrowStream.toArrowStreamReader and MetadataResultSets.of — built the RootAllocator first and then handed it to a new ArrowStreamReader. If the reader's constructor throws (today benign, but a future Arrow upgrade could add constructor-side validation), the allocator escapes both DataCloudResultSet.of's own try/catch (never entered) and the caller's catch. Wrap the construction in try/catch that closes the allocator on the way out and attaches any close failure as suppressed. Unify the per-allocator cap: MetadataResultSets used Long.MAX_VALUE while the gRPC path was capped at 100 MB. The cap exists because Arrow allocators are accounted memory, so hitting the cap throws a clean OutOfMemoryException instead of letting the JVM OOM. A getColumns(...) against a tenant with thousands of tables silently bypassed the cap on the metadata path. Promote the constant to public ROOT_ALLOCATOR_BUDGET_BYTES on QueryResultArrowStream and reuse it from MetadataResultSets. No new test: the failure mode requires ArrowStreamReader's constructor to throw, which doesn't happen with ByteArrayInputStream or the gRPC channel today. Pure code-shape fix; existing suite stays green.
NOTE: I'm not happy with the PR and the approach, this is for me to make reviews and steer the agent
Summary
Collapse the two ResultSet families (streaming Arrow + row-based metadata) into a single Arrow-backed implementation so there is one accessor pipeline, one set of type semantics, and one place to fix bugs. Also tightens root-allocator hygiene.
Built on top of #
moritz/centralize-types-via-hypertype.What changed
Unified result set.
DataCloudMetadataResultSet,SimpleResultSet, andColumnAccessorare removed. JDBC metadata calls (getTables,getColumns,getSchemas,getTypeInfo, and all empty-metadata helpers) now funnel throughStreamingResultSetvia a newMetadataArrowBuilderthat materialisesList<List<Object>>metadata rows into a populated ArrowVectorSchemaRoot.MetadataResultSetsis the factory callers use.Source-agnostic cursor.
ArrowStreamReaderCursornow accepts either a streamingArrowStreamReaderor a pre-populated in-memoryVectorSchemaRoot, driven by a pluggableBatchLoader. The cursor owns anAutoCloseableholding the backing resources and closes it on cursor close.Root allocator hygiene.
QueryResultArrowStream.toArrowStreamReaderpreviously leaked a 100 MBRootAllocator—ArrowStreamReader.close()only tears down vectors, not the allocator. It now returns aResultholder that pairs the reader with the allocator and closes both in the correct order (reader first, so ArrowBuf accounting clears before the allocator's budget check).StreamingResultSet.ofInMemory(root, owned, queryId, zone, cols)similarly takes ownership of the allocator + VSR through anAutoCloseable, so every code path closes its allocator.typeName preservation.
ofInMemoryaccepts an optional columns override so JDBC-spec labels like\"TEXT\"/\"INTEGER\"/\"SHORT\"survive a round-trip through Arrow (the derived HyperType names would otherwise be\"VARCHAR\"etc.).StreamingResultSet.getObject(int, Class) gains an
isInstancefallback sogetObject(col, String.class)on a VARCHAR works without each accessor implementing typedgetObject.Behavior changes worth calling out
Accessor semantics on metadata rows are now the same as on query results, which is stricter than the old row-based
SimpleResultSet:getBoolean/getDate/getTime/getTimestampon an integer column throwSQLExceptioninstead of loose-coercing.getByteon an integer column is now supported (previously threw in the metadata path).DataCloudDatabaseMetadataTestassertions were updated accordingly.Test plan
./gradlew clean buildpasses (includesspotlessCheck, all tests, JaCoCo coverage, verification)../gradlew :jdbc-core:test— 1222 tests, 0 failed.Breaking changes
com.salesforce.datacloud.jdbc.core.DataCloudResultSetflips frompublic interfacetopublic class. External code that wroteclass MyRs implements DataCloudResultSet(decorators, wrappers, hand-rolled doubles) no longer compiles. Code that only consumes the standardjava.sql.ResultSet/DataCloudResultSetAPI as an opaque type recompiles unchanged.The following previously-public types are removed:
StreamingResultSet,DataCloudMetadataResultSet,SimpleResultSet,ColumnAccessor. External callers ofStreamingResultSet.of(ArrowStreamReader, ...)should switch toDataCloudResultSet.of(QueryResultArrowStream.Result, ...).JDBC metadata accessor semantics on integer columns are stricter:
getDate/getTime/getTimestampnow throwSQLException(previouslyUnsupportedOperationException).getObject(col, Boolean.class)on integer columns now throws.BREAKING CHANGE:
DataCloudResultSetis now a class instead of an interface;StreamingResultSet,DataCloudMetadataResultSet,SimpleResultSet,ColumnAccessorare removed; metadata int-column accessors throw on date/time/Boolean conversions.