Skip to content

refactor!: unify ResultSet implementations on Arrow-backed path#175

Draft
mkaufmann wants to merge 18 commits into
mainfrom
moritz/unify-result-sets
Draft

refactor!: unify ResultSet implementations on Arrow-backed path#175
mkaufmann wants to merge 18 commits into
mainfrom
moritz/unify-result-sets

Conversation

@mkaufmann
Copy link
Copy Markdown
Member

@mkaufmann mkaufmann commented Apr 24, 2026

NOTE: I'm not happy with the PR and the approach, this is for me to make reviews and steer the agent

Summary

Collapse the two ResultSet families (streaming Arrow + row-based metadata) into a single Arrow-backed implementation so there is one accessor pipeline, one set of type semantics, and one place to fix bugs. Also tightens root-allocator hygiene.

Built on top of #moritz/centralize-types-via-hypertype.

What changed

Unified result set. DataCloudMetadataResultSet, SimpleResultSet, and ColumnAccessor are removed. JDBC metadata calls (getTables, getColumns, getSchemas, getTypeInfo, and all empty-metadata helpers) now funnel through StreamingResultSet via a new MetadataArrowBuilder that materialises List<List<Object>> metadata rows into a populated Arrow VectorSchemaRoot. MetadataResultSets is the factory callers use.

Source-agnostic cursor. ArrowStreamReaderCursor now accepts either a streaming ArrowStreamReader or a pre-populated in-memory VectorSchemaRoot, driven by a pluggable BatchLoader. The cursor owns an AutoCloseable holding the backing resources and closes it on cursor close.

Root allocator hygiene.

  • QueryResultArrowStream.toArrowStreamReader previously leaked a 100 MB RootAllocatorArrowStreamReader.close() only tears down vectors, not the allocator. It now returns a Result holder that pairs the reader with the allocator and closes both in the correct order (reader first, so ArrowBuf accounting clears before the allocator's budget check).
  • StreamingResultSet.ofInMemory(root, owned, queryId, zone, cols) similarly takes ownership of the allocator + VSR through an AutoCloseable, so every code path closes its allocator.

typeName preservation. ofInMemory accepts an optional columns override so JDBC-spec labels like \"TEXT\" / \"INTEGER\" / \"SHORT\" survive a round-trip through Arrow (the derived HyperType names would otherwise be \"VARCHAR\" etc.).

StreamingResultSet.getObject(int, Class) gains an isInstance fallback so getObject(col, String.class) on a VARCHAR works without each accessor implementing typed getObject.

Behavior changes worth calling out

Accessor semantics on metadata rows are now the same as on query results, which is stricter than the old row-based SimpleResultSet:

  • getBoolean / getDate / getTime / getTimestamp on an integer column throw SQLException instead of loose-coercing.
  • getByte on an integer column is now supported (previously threw in the metadata path).

DataCloudDatabaseMetadataTest assertions were updated accordingly.

Test plan

  • ./gradlew clean build passes (includes spotlessCheck, all tests, JaCoCo coverage, verification).
  • ./gradlew :jdbc-core:test — 1222 tests, 0 failed.
  • Spot-check downstream: Spark datasource still compiles (covered by full build).

Breaking changes

com.salesforce.datacloud.jdbc.core.DataCloudResultSet flips from public interface to public class. External code that wrote class MyRs implements DataCloudResultSet (decorators, wrappers, hand-rolled doubles) no longer compiles. Code that only consumes the standard java.sql.ResultSet / DataCloudResultSet API as an opaque type recompiles unchanged.

The following previously-public types are removed: StreamingResultSet, DataCloudMetadataResultSet, SimpleResultSet, ColumnAccessor. External callers of StreamingResultSet.of(ArrowStreamReader, ...) should switch to DataCloudResultSet.of(QueryResultArrowStream.Result, ...).

JDBC metadata accessor semantics on integer columns are stricter: getDate / getTime / getTimestamp now throw SQLException (previously UnsupportedOperationException). getObject(col, Boolean.class) on integer columns now throws.

BREAKING CHANGE: DataCloudResultSet is now a class instead of an interface; StreamingResultSet, DataCloudMetadataResultSet, SimpleResultSet, ColumnAccessor are removed; metadata int-column accessors throw on date/time/Boolean conversions.

mkaufmann added a commit that referenced this pull request Apr 24, 2026
…ArrowStreamReader

Rework the ResultSet unification to address two reviewer requests on #175:

1. Share the vector-building code with the parameter-encoding path instead
   of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a
   row-indexed primitive (setCell) used by both callers. The existing
   single-row parameter-binding overload and a new many-row metadata
   overload both funnel through it, and all the individual vector setters
   are parameterised by row index.

2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only
   interface. The metadata path now serialises a populated VSR to Arrow
   IPC bytes and wraps the result in a ByteArrayInputStream-backed
   ArrowStreamReader, so both streaming and metadata result sets travel
   through exactly the same reader/cursor plumbing.

Supporting changes:

- typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now
  round-trip through Arrow via a jdbc:type_name field-metadata key rather
  than a columns-override parameter on StreamingResultSet. HyperTypeToArrow
  stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it
  back.
- StreamingResultSet drops the ofInMemory(...) factory and the columns
  override; callers construct an ArrowStreamReader + BufferAllocator pair
  and hand them to of(reader, allocator, queryId, zone). The cursor owns
  both and closes reader-then-allocator on close.
- QueryResultArrowStream.toArrowStreamReader returns a simple Result
  holder (reader + allocator) instead of an AutoCloseable bundle.
- MetadataResultSets is the single entry point for Arrow-backed metadata
  result sets; MetadataArrowBuilder is deleted.
- Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor
  doesn't interpret a zero-row batch as "at least one row available".
- Tests updated to the new API; StreamingResultSetMethodTest builds its
  in-memory ResultSet the same way as the metadata path (IPC round-trip).
@mkaufmann mkaufmann force-pushed the moritz/centralize-types-via-hypertype branch 2 times, most recently from 9ecba8a to 08d62bc Compare May 7, 2026 19:10
Base automatically changed from moritz/centralize-types-via-hypertype to main May 7, 2026 19:25
mkaufmann added a commit that referenced this pull request May 11, 2026
…ArrowStreamReader

Rework the ResultSet unification to address two reviewer requests on #175:

1. Share the vector-building code with the parameter-encoding path instead
   of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a
   row-indexed primitive (setCell) used by both callers. The existing
   single-row parameter-binding overload and a new many-row metadata
   overload both funnel through it, and all the individual vector setters
   are parameterised by row index.

2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only
   interface. The metadata path now serialises a populated VSR to Arrow
   IPC bytes and wraps the result in a ByteArrayInputStream-backed
   ArrowStreamReader, so both streaming and metadata result sets travel
   through exactly the same reader/cursor plumbing.

Supporting changes:

- typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now
  round-trip through Arrow via a jdbc:type_name field-metadata key rather
  than a columns-override parameter on StreamingResultSet. HyperTypeToArrow
  stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it
  back.
- StreamingResultSet drops the ofInMemory(...) factory and the columns
  override; callers construct an ArrowStreamReader + BufferAllocator pair
  and hand them to of(reader, allocator, queryId, zone). The cursor owns
  both and closes reader-then-allocator on close.
- QueryResultArrowStream.toArrowStreamReader returns a simple Result
  holder (reader + allocator) instead of an AutoCloseable bundle.
- MetadataResultSets is the single entry point for Arrow-backed metadata
  result sets; MetadataArrowBuilder is deleted.
- Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor
  doesn't interpret a zero-row batch as "at least one row available".
- Tests updated to the new API; StreamingResultSetMethodTest builds its
  in-memory ResultSet the same way as the metadata path (IPC round-trip).
@mkaufmann mkaufmann force-pushed the moritz/unify-result-sets branch from b648940 to 07125b1 Compare May 11, 2026 13:38
@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

❌ Patch coverage is 74.38692% with 94 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.49%. Comparing base (f47714f) to head (dd484f2).

Files with missing lines Patch % Lines
...sforce/datacloud/jdbc/core/DataCloudResultSet.java 75.63% 47 Missing and 1 partial ⚠️
...tacloud/jdbc/core/metadata/MetadataResultSets.java 65.45% 15 Missing and 4 partials ⚠️
.../datacloud/jdbc/protocol/data/VectorPopulator.java 73.84% 12 Missing and 5 partials ⚠️
...atacloud/jdbc/protocol/QueryResultArrowStream.java 25.00% 6 Missing ⚠️
...datacloud/jdbc/protocol/data/HyperTypeToArrow.java 81.81% 1 Missing and 1 partial ⚠️
...e/datacloud/jdbc/core/ArrowStreamReaderCursor.java 80.00% 0 Missing and 1 partial ⚠️
...oud/jdbc/protocol/data/ArrowToHyperTypeMapper.java 75.00% 0 Missing and 1 partial ⚠️

❌ Your patch check has failed because the patch coverage (74.38%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #175      +/-   ##
============================================
- Coverage     82.37%   80.49%   -1.88%     
+ Complexity     1871     1707     -164     
============================================
  Files           125      123       -2     
  Lines          5020     4953      -67     
  Branches        540      521      -19     
============================================
- Hits           4135     3987     -148     
- Misses          642      737      +95     
+ Partials        243      229      -14     
Components Coverage Δ
JDBC Core 80.84% <74.38%> (-2.31%) ⬇️
JDBC Main 40.69% <ø> (ø)
JDBC HTTP 90.30% <ø> (ø)
JDBC Utilities 65.25% <ø> (ø)
Spark Datasource ∅ <ø> (∅)
Files with missing lines Coverage Δ
...force/datacloud/jdbc/core/DataCloudConnection.java 57.04% <100.00%> (ø)
...datacloud/jdbc/core/DataCloudDatabaseMetadata.java 98.34% <100.00%> (-0.01%) ⬇️
...sforce/datacloud/jdbc/core/DataCloudStatement.java 80.90% <100.00%> (-0.20%) ⬇️
...alesforce/datacloud/jdbc/core/MetadataSchemas.java 96.87% <100.00%> (+0.04%) ⬆️
...esforce/datacloud/jdbc/core/QueryMetadataUtil.java 95.40% <100.00%> (ø)
...oud/jdbc/core/SQLExceptionQueryResultIterator.java 75.00% <ø> (ø)
.../com/salesforce/datacloud/jdbc/util/Constants.java 0.00% <ø> (ø)
...e/datacloud/jdbc/core/ArrowStreamReaderCursor.java 88.57% <80.00%> (-2.06%) ⬇️
...oud/jdbc/protocol/data/ArrowToHyperTypeMapper.java 67.79% <75.00%> (+5.97%) ⬆️
...datacloud/jdbc/protocol/data/HyperTypeToArrow.java 76.92% <81.81%> (-1.34%) ⬇️
... and 4 more

... and 5 files with indirect coverage changes

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mkaufmann added a commit that referenced this pull request May 11, 2026
…ArrowStreamReader

Rework the ResultSet unification to address two reviewer requests on #175:

1. Share the vector-building code with the parameter-encoding path instead
   of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a
   row-indexed primitive (setCell) used by both callers. The existing
   single-row parameter-binding overload and a new many-row metadata
   overload both funnel through it, and all the individual vector setters
   are parameterised by row index.

2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only
   interface. The metadata path now serialises a populated VSR to Arrow
   IPC bytes and wraps the result in a ByteArrayInputStream-backed
   ArrowStreamReader, so both streaming and metadata result sets travel
   through exactly the same reader/cursor plumbing.

Supporting changes:

- typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now
  round-trip through Arrow via a jdbc:type_name field-metadata key rather
  than a columns-override parameter on StreamingResultSet. HyperTypeToArrow
  stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it
  back.
- StreamingResultSet drops the ofInMemory(...) factory and the columns
  override; callers construct an ArrowStreamReader + BufferAllocator pair
  and hand them to of(reader, allocator, queryId, zone). The cursor owns
  both and closes reader-then-allocator on close.
- QueryResultArrowStream.toArrowStreamReader returns a simple Result
  holder (reader + allocator) instead of an AutoCloseable bundle.
- MetadataResultSets is the single entry point for Arrow-backed metadata
  result sets; MetadataArrowBuilder is deleted.
- Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor
  doesn't interpret a zero-row batch as "at least one row available".
- Tests updated to the new API; StreamingResultSetMethodTest builds its
  in-memory ResultSet the same way as the metadata path (IPC round-trip).
mkaufmann added a commit that referenced this pull request May 11, 2026
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance
fallback as its base-class default, StreamingResultSet no longer needs
the catch-and-retry path that worked around accessors which threw
"Operation not supported." Collapse getObject(int, Class) to direct
dispatch and update the regression test's WHY comment to point at the
accessor base class as the load-bearing layer.

Addresses: review comment on PR #175 line 388.
mkaufmann added a commit that referenced this pull request May 11, 2026
Three small follow-ups from PR #175 review:

- StreamingResultSet.of: drop the paragraph that pointed at the
  HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The
  docstring spilled implementation detail of the metadata-stamping path
  into a generic "create a result set from a reader" entry-point; the
  type-name override is documented at HyperTypeToArrow / ColumnMetadata
  where it's relevant.

- ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale
  to answer "why does the cursor consume empty batches instead of the
  caller?" directly. Empty IPC batches are valid Arrow and producers
  emit them; JDBC's next() only knows rows, so this cursor is the seam
  that translates batch-level signals into row-level advances.

- MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next /
  isClosed / getStatement / unwrap / isWrapperFor / getHoldability /
  getFetchSize / setFetchSize / getWarnings / getConcurrency / getType
  / getFetchDirection). Those test the StreamingResultSet plumbing
  shared by every result set on this branch and are already covered by
  StreamingResultSetMethodTest. Keep the arity-contract slice
  (short/long/right/null/empty rows) — that is the
  metadata-result-set-specific behavior.

Addresses: review comments on PR #175.
@mkaufmann mkaufmann force-pushed the moritz/unify-result-sets branch from 07125b1 to e329860 Compare May 11, 2026 15:02
mkaufmann added a commit that referenced this pull request May 11, 2026
…ArrowStreamReader

Rework the ResultSet unification to address two reviewer requests on #175:

1. Share the vector-building code with the parameter-encoding path instead
   of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a
   row-indexed primitive (setCell) used by both callers. The existing
   single-row parameter-binding overload and a new many-row metadata
   overload both funnel through it, and all the individual vector setters
   are parameterised by row index.

2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only
   interface. The metadata path now serialises a populated VSR to Arrow
   IPC bytes and wraps the result in a ByteArrayInputStream-backed
   ArrowStreamReader, so both streaming and metadata result sets travel
   through exactly the same reader/cursor plumbing.

Supporting changes:

- typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now
  round-trip through Arrow via a jdbc:type_name field-metadata key rather
  than a columns-override parameter on StreamingResultSet. HyperTypeToArrow
  stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it
  back.
- StreamingResultSet drops the ofInMemory(...) factory and the columns
  override; callers construct an ArrowStreamReader + BufferAllocator pair
  and hand them to of(reader, allocator, queryId, zone). The cursor owns
  both and closes reader-then-allocator on close.
- QueryResultArrowStream.toArrowStreamReader returns a simple Result
  holder (reader + allocator) instead of an AutoCloseable bundle.
- MetadataResultSets is the single entry point for Arrow-backed metadata
  result sets; MetadataArrowBuilder is deleted.
- Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor
  doesn't interpret a zero-row batch as "at least one row available".
- Tests updated to the new API; StreamingResultSetMethodTest builds its
  in-memory ResultSet the same way as the metadata path (IPC round-trip).
mkaufmann added a commit that referenced this pull request May 11, 2026
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance
fallback as its base-class default, StreamingResultSet no longer needs
the catch-and-retry path that worked around accessors which threw
"Operation not supported." Collapse getObject(int, Class) to direct
dispatch and update the regression test's WHY comment to point at the
accessor base class as the load-bearing layer.

Addresses: review comment on PR #175 line 388.
mkaufmann added a commit that referenced this pull request May 11, 2026
Three small follow-ups from PR #175 review:

- StreamingResultSet.of: drop the paragraph that pointed at the
  HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The
  docstring spilled implementation detail of the metadata-stamping path
  into a generic "create a result set from a reader" entry-point; the
  type-name override is documented at HyperTypeToArrow / ColumnMetadata
  where it's relevant.

- ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale
  to answer "why does the cursor consume empty batches instead of the
  caller?" directly. Empty IPC batches are valid Arrow and producers
  emit them; JDBC's next() only knows rows, so this cursor is the seam
  that translates batch-level signals into row-level advances.

- MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next /
  isClosed / getStatement / unwrap / isWrapperFor / getHoldability /
  getFetchSize / setFetchSize / getWarnings / getConcurrency / getType
  / getFetchDirection). Those test the StreamingResultSet plumbing
  shared by every result set on this branch and are already covered by
  StreamingResultSetMethodTest. Keep the arity-contract slice
  (short/long/right/null/empty rows) — that is the
  metadata-result-set-specific behavior.

Addresses: review comments on PR #175.
@mkaufmann mkaufmann force-pushed the moritz/unify-result-sets branch from e329860 to d17f1b0 Compare May 11, 2026 16:20
@mkaufmann
Copy link
Copy Markdown
Member Author

Per the two review threads, split out the cherry-pickable fixes as their own PRs against main:

This PR (#175) keeps the same fixes as the first two commits — when #185 / #186 land, those commits will collapse to no-ops at rebase time.

For the remaining "should QueryResultArrowStream allocator-ownership move pre-unify too?" thread (#175 review): waiting on your call before I do that split. As I noted there, it's a non-trivial surgery on the unify commit and I'd rather get your sign-off before rewriting ~800 lines of refactor.

mkaufmann added a commit that referenced this pull request May 11, 2026
…ArrowStreamReader

Rework the ResultSet unification to address two reviewer requests on #175:

1. Share the vector-building code with the parameter-encoding path instead
   of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a
   row-indexed primitive (setCell) used by both callers. The existing
   single-row parameter-binding overload and a new many-row metadata
   overload both funnel through it, and all the individual vector setters
   are parameterised by row index.

2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only
   interface. The metadata path now serialises a populated VSR to Arrow
   IPC bytes and wraps the result in a ByteArrayInputStream-backed
   ArrowStreamReader, so both streaming and metadata result sets travel
   through exactly the same reader/cursor plumbing.

Supporting changes:

- typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now
  round-trip through Arrow via a jdbc:type_name field-metadata key rather
  than a columns-override parameter on StreamingResultSet. HyperTypeToArrow
  stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it
  back.
- StreamingResultSet drops the ofInMemory(...) factory and the columns
  override; callers construct an ArrowStreamReader + BufferAllocator pair
  and hand them to of(reader, allocator, queryId, zone). The cursor owns
  both and closes reader-then-allocator on close.
- QueryResultArrowStream.toArrowStreamReader returns a simple Result
  holder (reader + allocator) instead of an AutoCloseable bundle.
- MetadataResultSets is the single entry point for Arrow-backed metadata
  result sets; MetadataArrowBuilder is deleted.
- Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor
  doesn't interpret a zero-row batch as "at least one row available".
- Tests updated to the new API; StreamingResultSetMethodTest builds its
  in-memory ResultSet the same way as the metadata path (IPC round-trip).
mkaufmann added a commit that referenced this pull request May 11, 2026
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance
fallback as its base-class default, StreamingResultSet no longer needs
the catch-and-retry path that worked around accessors which threw
"Operation not supported." Collapse getObject(int, Class) to direct
dispatch and update the regression test's WHY comment to point at the
accessor base class as the load-bearing layer.

Addresses: review comment on PR #175 line 388.
mkaufmann added a commit that referenced this pull request May 11, 2026
Three small follow-ups from PR #175 review:

- StreamingResultSet.of: drop the paragraph that pointed at the
  HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The
  docstring spilled implementation detail of the metadata-stamping path
  into a generic "create a result set from a reader" entry-point; the
  type-name override is documented at HyperTypeToArrow / ColumnMetadata
  where it's relevant.

- ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale
  to answer "why does the cursor consume empty batches instead of the
  caller?" directly. Empty IPC batches are valid Arrow and producers
  emit them; JDBC's next() only knows rows, so this cursor is the seam
  that translates batch-level signals into row-level advances.

- MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next /
  isClosed / getStatement / unwrap / isWrapperFor / getHoldability /
  getFetchSize / setFetchSize / getWarnings / getConcurrency / getType
  / getFetchDirection). Those test the StreamingResultSet plumbing
  shared by every result set on this branch and are already covered by
  StreamingResultSetMethodTest. Keep the arity-contract slice
  (short/long/right/null/empty rows) — that is the
  metadata-result-set-specific behavior.

Addresses: review comments on PR #175.
@mkaufmann mkaufmann force-pushed the moritz/unify-result-sets branch from d17f1b0 to f4cad29 Compare May 11, 2026 17:16
mkaufmann added a commit that referenced this pull request May 11, 2026
…ArrowStreamReader

Rework the ResultSet unification to address two reviewer requests on #175:

1. Share the vector-building code with the parameter-encoding path instead
   of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a
   row-indexed primitive (setCell) used by both callers. The existing
   single-row parameter-binding overload and a new many-row metadata
   overload both funnel through it, and all the individual vector setters
   are parameterised by row index.

2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only
   interface. The metadata path now serialises a populated VSR to Arrow
   IPC bytes and wraps the result in a ByteArrayInputStream-backed
   ArrowStreamReader, so both streaming and metadata result sets travel
   through exactly the same reader/cursor plumbing.

Supporting changes:

- typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now
  round-trip through Arrow via a jdbc:type_name field-metadata key rather
  than a columns-override parameter on StreamingResultSet. HyperTypeToArrow
  stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it
  back.
- StreamingResultSet drops the ofInMemory(...) factory and the columns
  override; callers construct an ArrowStreamReader + BufferAllocator pair
  and hand them to of(reader, allocator, queryId, zone). The cursor owns
  both and closes reader-then-allocator on close.
- QueryResultArrowStream.toArrowStreamReader returns a simple Result
  holder (reader + allocator) instead of an AutoCloseable bundle.
- MetadataResultSets is the single entry point for Arrow-backed metadata
  result sets; MetadataArrowBuilder is deleted.
- Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor
  doesn't interpret a zero-row batch as "at least one row available".
- Tests updated to the new API; StreamingResultSetMethodTest builds its
  in-memory ResultSet the same way as the metadata path (IPC round-trip).
mkaufmann added a commit that referenced this pull request May 11, 2026
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance
fallback as its base-class default, StreamingResultSet no longer needs
the catch-and-retry path that worked around accessors which threw
"Operation not supported." Collapse getObject(int, Class) to direct
dispatch and update the regression test's WHY comment to point at the
accessor base class as the load-bearing layer.

Addresses: review comment on PR #175 line 388.
mkaufmann added a commit that referenced this pull request May 11, 2026
Three small follow-ups from PR #175 review:

- StreamingResultSet.of: drop the paragraph that pointed at the
  HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The
  docstring spilled implementation detail of the metadata-stamping path
  into a generic "create a result set from a reader" entry-point; the
  type-name override is documented at HyperTypeToArrow / ColumnMetadata
  where it's relevant.

- ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale
  to answer "why does the cursor consume empty batches instead of the
  caller?" directly. Empty IPC batches are valid Arrow and producers
  emit them; JDBC's next() only knows rows, so this cursor is the seam
  that translates batch-level signals into row-level advances.

- MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next /
  isClosed / getStatement / unwrap / isWrapperFor / getHoldability /
  getFetchSize / setFetchSize / getWarnings / getConcurrency / getType
  / getFetchDirection). Those test the StreamingResultSet plumbing
  shared by every result set on this branch and are already covered by
  StreamingResultSetMethodTest. Keep the arity-contract slice
  (short/long/right/null/empty rows) — that is the
  metadata-result-set-specific behavior.

Addresses: review comments on PR #175.
@mkaufmann mkaufmann force-pushed the moritz/unify-result-sets branch 2 times, most recently from 67ddd24 to b97abe2 Compare May 11, 2026 18:37
mkaufmann added a commit that referenced this pull request May 12, 2026
…ArrowStreamReader

Rework the ResultSet unification to address two reviewer requests on #175:

1. Share the vector-building code with the parameter-encoding path instead
   of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a
   row-indexed primitive (setCell) used by both callers. The existing
   single-row parameter-binding overload and a new many-row metadata
   overload both funnel through it, and all the individual vector setters
   are parameterised by row index.

2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only
   interface. The metadata path now serialises a populated VSR to Arrow
   IPC bytes and wraps the result in a ByteArrayInputStream-backed
   ArrowStreamReader, so both streaming and metadata result sets travel
   through exactly the same reader/cursor plumbing.

Supporting changes:

- typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now
  round-trip through Arrow via a jdbc:type_name field-metadata key rather
  than a columns-override parameter on StreamingResultSet. HyperTypeToArrow
  stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it
  back.
- StreamingResultSet drops the ofInMemory(...) factory and the columns
  override; callers construct an ArrowStreamReader + BufferAllocator pair
  and hand them to of(reader, allocator, queryId, zone). The cursor owns
  both and closes reader-then-allocator on close.
- QueryResultArrowStream.toArrowStreamReader returns a simple Result
  holder (reader + allocator) instead of an AutoCloseable bundle.
- MetadataResultSets is the single entry point for Arrow-backed metadata
  result sets; MetadataArrowBuilder is deleted.
- Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor
  doesn't interpret a zero-row batch as "at least one row available".
- Tests updated to the new API; StreamingResultSetMethodTest builds its
  in-memory ResultSet the same way as the metadata path (IPC round-trip).
mkaufmann added a commit that referenced this pull request May 12, 2026
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance
fallback as its base-class default, StreamingResultSet no longer needs
the catch-and-retry path that worked around accessors which threw
"Operation not supported." Collapse getObject(int, Class) to direct
dispatch and update the regression test's WHY comment to point at the
accessor base class as the load-bearing layer.

Addresses: review comment on PR #175 line 388.
mkaufmann added a commit that referenced this pull request May 12, 2026
Three small follow-ups from PR #175 review:

- StreamingResultSet.of: drop the paragraph that pointed at the
  HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The
  docstring spilled implementation detail of the metadata-stamping path
  into a generic "create a result set from a reader" entry-point; the
  type-name override is documented at HyperTypeToArrow / ColumnMetadata
  where it's relevant.

- ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale
  to answer "why does the cursor consume empty batches instead of the
  caller?" directly. Empty IPC batches are valid Arrow and producers
  emit them; JDBC's next() only knows rows, so this cursor is the seam
  that translates batch-level signals into row-level advances.

- MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next /
  isClosed / getStatement / unwrap / isWrapperFor / getHoldability /
  getFetchSize / setFetchSize / getWarnings / getConcurrency / getType
  / getFetchDirection). Those test the StreamingResultSet plumbing
  shared by every result set on this branch and are already covered by
  StreamingResultSetMethodTest. Keep the arity-contract slice
  (short/long/right/null/empty rows) — that is the
  metadata-result-set-specific behavior.

Addresses: review comments on PR #175.
@mkaufmann mkaufmann force-pushed the moritz/unify-result-sets branch from 84391cd to d90bb9a Compare May 12, 2026 14:57
mkaufmann added a commit that referenced this pull request May 12, 2026
…ArrowStreamReader

Rework the ResultSet unification to address two reviewer requests on #175:

1. Share the vector-building code with the parameter-encoding path instead
   of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a
   row-indexed primitive (setCell) used by both callers. The existing
   single-row parameter-binding overload and a new many-row metadata
   overload both funnel through it, and all the individual vector setters
   are parameterised by row index.

2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only
   interface. The metadata path now serialises a populated VSR to Arrow
   IPC bytes and wraps the result in a ByteArrayInputStream-backed
   ArrowStreamReader, so both streaming and metadata result sets travel
   through exactly the same reader/cursor plumbing.

Supporting changes:

- typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now
  round-trip through Arrow via a jdbc:type_name field-metadata key rather
  than a columns-override parameter on StreamingResultSet. HyperTypeToArrow
  stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it
  back.
- StreamingResultSet drops the ofInMemory(...) factory and the columns
  override; callers construct an ArrowStreamReader + BufferAllocator pair
  and hand them to of(reader, allocator, queryId, zone). The cursor owns
  both and closes reader-then-allocator on close.
- QueryResultArrowStream.toArrowStreamReader returns a simple Result
  holder (reader + allocator) instead of an AutoCloseable bundle.
- MetadataResultSets is the single entry point for Arrow-backed metadata
  result sets; MetadataArrowBuilder is deleted.
- Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor
  doesn't interpret a zero-row batch as "at least one row available".
- Tests updated to the new API; StreamingResultSetMethodTest builds its
  in-memory ResultSet the same way as the metadata path (IPC round-trip).
mkaufmann added a commit that referenced this pull request May 12, 2026
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance
fallback as its base-class default, StreamingResultSet no longer needs
the catch-and-retry path that worked around accessors which threw
"Operation not supported." Collapse getObject(int, Class) to direct
dispatch and update the regression test's WHY comment to point at the
accessor base class as the load-bearing layer.

Addresses: review comment on PR #175 line 388.
mkaufmann added a commit that referenced this pull request May 12, 2026
Three small follow-ups from PR #175 review:

- StreamingResultSet.of: drop the paragraph that pointed at the
  HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The
  docstring spilled implementation detail of the metadata-stamping path
  into a generic "create a result set from a reader" entry-point; the
  type-name override is documented at HyperTypeToArrow / ColumnMetadata
  where it's relevant.

- ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale
  to answer "why does the cursor consume empty batches instead of the
  caller?" directly. Empty IPC batches are valid Arrow and producers
  emit them; JDBC's next() only knows rows, so this cursor is the seam
  that translates batch-level signals into row-level advances.

- MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next /
  isClosed / getStatement / unwrap / isWrapperFor / getHoldability /
  getFetchSize / setFetchSize / getWarnings / getConcurrency / getType
  / getFetchDirection). Those test the StreamingResultSet plumbing
  shared by every result set on this branch and are already covered by
  StreamingResultSetMethodTest. Keep the arity-contract slice
  (short/long/right/null/empty rows) — that is the
  metadata-result-set-specific behavior.

Addresses: review comments on PR #175.
@mkaufmann mkaufmann force-pushed the moritz/unify-result-sets branch from d90bb9a to b83e8b7 Compare May 12, 2026 19:13
Copy link
Copy Markdown
Member Author

@mkaufmann mkaufmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary

12 findings: 1 Blocking, 8 Suggested, 3 Optional. The PR collapses the metadata path onto the streaming Arrow pipeline cleanly, and the close-order story for the new Result holder is right where it lands. The findings cluster around three themes:

  1. API surface. The diff promotes DataCloudResultSet from an interface to a concrete class. That's a binary break for anyone implementing or proxying the type from the shaded JAR — flagged at DataCloudResultSet.java:49 (Blocking). StreamingResultSet, DataCloudMetadataResultSet, SimpleResultSet, and ColumnAccessor all leave the public surface without a deprecation window. With release-please configured for Conventional Commits, the squash-merge subject refactor: will not surface a CHANGELOG entry — finding S7 covers this.

  2. JDBC-spec compliance regressions on metadata. MetadataSchemas.TYPE_INFO declares CASE_SENSITIVE / UNSIGNED_ATTRIBUTE / FIXED_PREC_SCALE / AUTO_INCREMENT as text(...) columns while HyperTypes.typeInfoRows() populates them with Boolean values; the new VarCharVectorSetter toString()s them and BaseIntVectorAccessor.getBoolean is not overridden, so getBoolean(...) on these columns now throws where the old SimpleResultSet returned the raw Boolean (S1, S2). JDBC 4.2 Table B-6 lists INTEGER → boolean and VARCHAR → boolean (for "true"/"false"/"1"/"0") as recommended conversions; tools that read NULLABLE from getColumns() or boolean flags from getTypeInfo() will break.

  3. Allocator-hygiene gaps that survive the cleanup. cursor.close() and DataCloudResultSet.of's failure path use plain try/finally without addSuppressed, so if the leak detector trips on allocator.close() it masks the original exception (S3). DataCloudStatement.executeQuery / getResultSet calls getQueryId() between allocator construction and ownership transfer to DataCloudResultSet.of — if that gRPC call throws, the allocator escapes (S4).

On the PR body itself: The body describes a design that isn't in the diff. The "source-agnostic cursor" / BatchLoader / AutoCloseable ownership and the StreamingResultSet.ofInMemory(root, owned, queryId, zone, cols) factory don't exist in the cursor or in DataCloudResultSet. The metadata path actually round-trips through ArrowStreamWriter → IPC bytes → ArrowStreamReader (verified at MetadataResultSets.java:88), and the typeName-preservation mechanism rides on Arrow field metadata under a datacloud-jdbc:type_name key (HyperTypeToArrow.toField / ArrowToHyperTypeMapper.toColumnMetadata), not on a cols parameter. The body also references a non-existent MetadataArrowBuilder class (the factory is MetadataResultSets), names StreamingResultSet as the unified type when in fact StreamingResultSet.java is deleted and DataCloudResultSet.java is the new home, and the "Built on top of #moritz/centralize-types-via-hypertype" line is malformed (won't render as a PR ref) and stale (the dependency landed as #174). The first-line "I'm not happy with the PR and the approach" disclaimer would land in git log permanently if squashed as-is.

Range-check tightening (IntVectorSetter / SmallIntVectorSetter / TinyIntVectorSetter now throw IllegalArgumentException on out-of-range Numbers — verified at VectorPopulator.java:240,261,433) is a real behavior change that the PR body's "Behavior changes worth calling out" section omits. Worth a bullet so future debugging of IllegalArgumentException from prepared-statement parameter binding has a paper trail.

Tests. Behavior change "metadata int columns no longer accept getBoolean" is no longer pinned by a test — the old assertThat(getBoolean("NULLABLE")).isFalse() was replaced with getInt("NULLABLE"), and the test comment ("long→boolean coerces to false") is misleading because the code does not actually coerce — BaseIntVectorAccessor inherits getBoolean from QueryJDBCAccessor, which throws. A direct assertThrows(SQLException.class, () -> getBoolean("ORDINAL_POSITION")) would pin it.

Inline findings follow.


Generated by the review-pr-tavern skill — a human did not write this comment.

assertThat(columnResultSet.getInt("DATA_TYPE")).isEqualTo(12);
assertThat(columnResultSet.getBoolean("NULLABLE")).isFalse();
// NULLABLE is an INTEGER column. Arrow-backed getInt reports the nullability enum;
// 0 (columnNoNulls) for NOT NULL rows, which coerces to false via long→boolean.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested — The test comment says "0 (columnNoNulls) for NOT NULL rows, which coerces to false via long→boolean", and the previous getBoolean("NULLABLE") assertion was replaced with getInt("NULLABLE"). But BaseIntVectorAccessor does not override getBoolean() — the call falls through to QueryJDBCAccessor.getBoolean() (line 52-54) which throws SQLFeatureNotSupportedException. The comment describes a coercion that does not happen in this codebase, and the behavior-change claim from the PR body ("getBoolean ... on an integer column throws SQLException") is no longer pinned by any direct assertion. Add:

assertThrows(SQLException.class, () -> columnResultSet.getBoolean("ORDINAL_POSITION"));

next to the existing date/time assertThrows block — it pins the new contract and removes the misleading comment in one go. (See the separate finding on MetadataSchemas for the boolean-coercion gap that arguably should still work for spec compliance.)


Generated by the review-pr-tavern skill — a human did not write this comment.

* @param queryId The query identifier (may be {@code null} for synthesized result sets).
* @param sessionZone The session timezone used for timestamp conversions.
*/
public static DataCloudResultSet of(QueryResultArrowStream.Result arrowStream, String queryId, ZoneId sessionZone)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested — Three deletions worth a CHANGELOG entry: StreamingResultSet, DataCloudMetadataResultSet, SimpleResultSet, ColumnAccessor were all public on main. Even if no in-tree code outside :jdbc-core referenced them, they were reachable from the shaded fat JAR. StreamingResultSet.of(ArrowStreamReader, queryId) and of(ArrowStreamReader, queryId, ZoneId) were the only public entry points for wrapping a foreign Arrow stream into a JDBC ResultSet. The new DataCloudResultSet.of(QueryResultArrowStream.Result, ...) requires constructing an internal Result holder, which is a tighter coupling than the old factories.

If there's no migration alias, please at least call this out in a BREAKING CHANGE: footer so release-please picks it up (the refactor: subject otherwise produces no CHANGELOG entry — see the release-notes finding). A thin public static DataCloudResultSet of(ArrowStreamReader reader, BufferAllocator allocator, String queryId, ZoneId zone) overload that internally builds a Result would also let external Arrow-stream wrappers continue to compile.

Release-notes coverage. Release-notes coverage. The repo uses release-please with release-type: simple and Conventional Commits. A squash-merge subject of refactor: unify ResultSet implementations on Arrow-backed path produces no CHANGELOG entry: refactor isn't on the release-please default sections, and at 0.x.y a feat: would be needed for a minor bump otherwise. Combined with the API surface deltas in this PR (interface→class on DataCloudResultSet, four removed public types, stricter accessor coercion on metadata int columns), the change will silently land in the next release and users discovering the break via CHANGELOG.md will have nothing to read.

Use refactor!: for the squash subject and add a BREAKING CHANGE: footer enumerating the specific deletions and behavior changes. release-please reads the ! and the footer and produces a Breaking Changes section.


Generated by the review-pr-tavern skill — a human did not write this comment.


@Override
public void close() throws SQLException {
if (!closed) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optionalclose() reads if (!closed) { cursor.close(); closed = true; }. If cursor.close() throws — e.g. the allocator's leak detector fires IllegalStateExceptionclosed stays false. A defensive caller that catches and retries close() (try-with-resources won't, but JDBC connection pools and driver wrappers sometimes do) re-enters cursor.close(), which calls allocator.close() on an already-closed RootAllocator. Arrow throws on the second close.

Flip the order — set closed = true before delegating, or wrap in try/finally that sets it unconditionally. The caller still gets the cleanup exception; subsequent calls become no-ops, matching the standard JDK AutoCloseable pattern.


Generated by the review-pr-tavern skill — a human did not write this comment.

@Override
protected void setValueInternal(DecimalVector vector, BigDecimal value) {
vector.setSafe(0, value.unscaledValue().longValue());
protected void setValueInternal(DecimalVector vector, int index, BigDecimal value) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OptionalDecimalVectorSetter.setValueInternal does vector.setSafe(index, value.unscaledValue().longValue()). A BigDecimal whose unscaled exceeds Long.MAX_VALUE silently truncates. The PR added IllegalArgumentException range checks on IntVectorSetter (line 240), SmallIntVectorSetter (line 261), and TinyIntVectorSetter (line 433); decimal got the signature update but not the guard. The asymmetry is jarring now that this is the single populator path for both binding and metadata.

Not required (today's call sites probably stay in range), but worth a bitLength() > 63 → IllegalArgumentException check or a one-line javadoc explaining why decimals are exempt.


Generated by the review-pr-tavern skill — a human did not write this comment.

@SneakyThrows
void closesTheReader() {
val sut = new ArrowStreamReaderCursor(reader, ZoneId.systemDefault());
void closesReaderAndAllocator() {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional — The new closesReaderAndAllocator test verifies both reader.close() and allocator.close() ran, but pins neither the order nor the throw-during-close contract. The cursor's javadoc explicitly says reader-first ordering exists "because reversing trips a leak detector", and the body uses try/finally so allocator.close runs even when reader.close throws. Both are exactly the properties most likely to regress.

A two-line tightening covers it:

InOrder inOrder = Mockito.inOrder(reader, allocator);
inOrder.verify(reader).close();
inOrder.verify(allocator).close();

plus a second test where reader.close() is doThrow(...)-ed and the assertion is that allocator.close() still ran.


Generated by the review-pr-tavern skill — a human did not write this comment.

mkaufmann added 14 commits May 15, 2026 12:14
Collapse the two ResultSet families (streaming Arrow + row-based
metadata) into a single Arrow-backed implementation so there is one
accessor pipeline, one set of type semantics, and one place to fix bugs.

Changes:

- ArrowStreamReaderCursor becomes source-agnostic: a BatchLoader drives
  a VectorSchemaRoot, whether sourced from an ArrowStreamReader or a
  pre-populated in-memory batch. The cursor also owns an AutoCloseable
  so it is responsible for releasing the allocator + reader on close —
  the old ArrowStreamReader.close() would only tear down vectors and
  leak the 100 MB RootAllocator.
- QueryResultArrowStream.toArrowStreamReader returns a Result holder
  that pairs the reader with the allocator and closes both in the
  right order so Arrow's accounting invariants hold.
- StreamingResultSet gains ofInMemory(root, owned, queryId, zone, cols)
  so metadata results funnel through the same result set. A columns
  override preserves the JDBC-spec typeName labels (e.g. TEXT) that
  would otherwise be lost when deriving from the Arrow schema.
- MetadataArrowBuilder materialises List<List<Object>> metadata rows
  into a populated VectorSchemaRoot using the existing HyperTypeToArrow
  mapping; MetadataResultSets is the factory callers use.
- QueryMetadataUtil and DataCloudDatabaseMetadata route getTables,
  getColumns, getSchemas, getTypeInfo and empty metadata results
  through the Arrow-backed StreamingResultSet.
- DataCloudMetadataResultSet, SimpleResultSet, and ColumnAccessor are
  removed now that no caller depends on them.
- StreamingResultSet.getObject(int, Class) gains an isInstance-based
  fallback so callers can retrieve String-typed VARCHAR columns without
  each accessor having to implement typed getObject.
- Tests moved to the unified path; integer-accessor-only assertions in
  DataCloudDatabaseMetadataTest updated to reflect stricter Arrow
  accessor semantics.
…ArrowStreamReader

Rework the ResultSet unification to address two reviewer requests on #175:

1. Share the vector-building code with the parameter-encoding path instead
   of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a
   row-indexed primitive (setCell) used by both callers. The existing
   single-row parameter-binding overload and a new many-row metadata
   overload both funnel through it, and all the individual vector setters
   are parameterised by row index.

2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only
   interface. The metadata path now serialises a populated VSR to Arrow
   IPC bytes and wraps the result in a ByteArrayInputStream-backed
   ArrowStreamReader, so both streaming and metadata result sets travel
   through exactly the same reader/cursor plumbing.

Supporting changes:

- typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now
  round-trip through Arrow via a jdbc:type_name field-metadata key rather
  than a columns-override parameter on StreamingResultSet. HyperTypeToArrow
  stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it
  back.
- StreamingResultSet drops the ofInMemory(...) factory and the columns
  override; callers construct an ArrowStreamReader + BufferAllocator pair
  and hand them to of(reader, allocator, queryId, zone). The cursor owns
  both and closes reader-then-allocator on close.
- QueryResultArrowStream.toArrowStreamReader returns a simple Result
  holder (reader + allocator) instead of an AutoCloseable bundle.
- MetadataResultSets is the single entry point for Arrow-backed metadata
  result sets; MetadataArrowBuilder is deleted.
- Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor
  doesn't interpret a zero-row batch as "at least one row available".
- Tests updated to the new API; StreamingResultSetMethodTest builds its
  in-memory ResultSet the same way as the metadata path (IPC round-trip).
StreamingResultSet.of catches IOException and IllegalArgumentException
from the Arrow schema decode and rewraps as SQLException. At all four
query-path call sites (DataCloudConnection.getRowBasedResultSet,
getChunkBasedResultSet, DataCloudStatement.executeQuery, getResultSet)
the surrounding try-catch only catches StatusRuntimeException, so a
SQLException thrown from of() bypasses it and leaks the 100 MB
RootAllocator returned by QueryResultArrowStream.toArrowStreamReader.

Introduce StreamingResultSet.ofClosingOnFailure(Result, queryId,
sessionZone) that takes the reader+allocator pair and closes both on
construction failure (reader first so its buffers release before the
allocator's budget check). Switch all four call sites to it.

The metadata path in MetadataResultSets.of already had this shape; this
fixes the matching gap on the query side.

Add a regression test that builds an Arrow IPC stream with an
unsupported field type (LargeUtf8) and asserts the helper closes both
the reader and the allocator on the resulting SQLException.
The Int/SmallInt/TinyInt setters widened from concrete boxed types
(Integer/Short/Byte) to Number so metadata rows could pass long values,
but lost the implicit "right boxed type" check at the call sites that
went through DataCloudPreparedStatement.setObject for parameter
binding. A user binding Long.MAX_VALUE to an INT32 parameter would
silently get (int) Long.MAX_VALUE = -1 written to the vector.

Add an explicit range check on Int/SmallInt/TinyInt setters before
narrowing. Both the metadata path and the parameter-binding path go
through these setters, so strict checks here mean strict on both
paths. BigInt accepts the full long range and is unchanged.

Pin the behavior with a focused unit test (IntegerVectorSetterRangeCheckTest).
The driver round-trips JDBC-spec type-name overrides (e.g. "TEXT" for
metadata columns) through Arrow field metadata under a custom key. The
previous key "jdbc:type_name" used an unprefixed namespace not reserved
by the Arrow spec — Hyper, query-federator, or another Arrow producer
could emit a same-named key in a future protocol version, in which case
ArrowToHyperTypeMapper would silently override its own derived type
name with whatever upstream stamped.

Rename to "datacloud-jdbc:type_name" so the namespace is unambiguous,
and expand the field's javadoc to document the namespace rationale.
The fallback in ArrowToHyperTypeMapper.toColumnMetadata — when a field
has no datacloud-jdbc:type_name override, ColumnMetadata.typeName is
null and the JDBC layer derives the column type-name from the
HyperType — was load-bearing but unasserted. Real Hyper Arrow streams
never stamp the override, so every functional query test exercised the
fallback implicitly; if a future refactor broke it, the regression
would not surface in the existing suite.

Two new pin tests:
- ArrowToHyperTypeMapperTest at the unit boundary: field with override
  -> typeName matches; field without override (null metadata, empty
  metadata) -> typeName is null.
- StreamingResultSetTest.getColumnTypeNameFallsBackToDerivedNameOnRealHyperStream
  end-to-end against local Hyper: executeQuery on a select with INT,
  VARCHAR, DECIMAL columns asserts ResultSetMetaData.getColumnTypeName
  returns the derived names ("INTEGER", "VARCHAR", "DECIMAL").
Drive-by pin test: StreamingResultSet.getObject(int, Map<String,Class<?>>)
with a null or empty type map should behave like plain getObject(int)
per the JDBC spec. Previously not asserted anywhere.

The companion getObject(Class) fallback test landed earlier on this
branch, bundled into the QueryJDBCAccessor base-class fix commit so
the fix and its end-to-end coverage ship as a single cherry-pickable
unit.
Previously a row with the wrong number of elements would silently leave
the trailing columns as Arrow null (interpreted as missing values).
Today every caller routes through MetadataSchemas so the sizes match by
construction, but a future caller bug would surface only inside vector
population, far from the boundary.

Add an explicit arity check at the of(...) entrypoint: each non-null
row must have exactly columns.size() elements. Null rows are accepted
as the all-nulls row (matching the legacy coerceRows convention of
turning null into emptyList). Empty rows are accepted only when the
schema is also empty.

Pin behavior with MetadataResultSetsTest covering short, long,
correct-arity, null-row, and empty-rows cases.
Now that ArrowStreamReaderCursor.loadNextNonEmptyBatch (introduced
earlier on this branch as a pre-unify cursor fix) consumes empty
batches at the cursor seam, MetadataResultSets.writeArrowStream no
longer needs its own "skip writeBatch when rowCount==0" workaround:
the cursor handles the empty-only case correctly. Remove the special
case and always emit a batch.

Tightens the zeroRowOnlyBatchYieldsNoRows test docstring to match.
DataCloudMetadataResultSet was deleted in this PR, but the test file
retained the old name and lived in the wrong package. Merge its empty-
result-set JDBC-shape smoke tests into the new MetadataResultSetsTest
under the .core.metadata package and delete the legacy file. No
behavior change.
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance
fallback as its base-class default, StreamingResultSet no longer needs
the catch-and-retry path that worked around accessors which threw
"Operation not supported." Collapse getObject(int, Class) to direct
dispatch and update the regression test's WHY comment to point at the
accessor base class as the load-bearing layer.

Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review:

- StreamingResultSet.of: drop the paragraph that pointed at the
  HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The
  docstring spilled implementation detail of the metadata-stamping path
  into a generic "create a result set from a reader" entry-point; the
  type-name override is documented at HyperTypeToArrow / ColumnMetadata
  where it's relevant.

- ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale
  to answer "why does the cursor consume empty batches instead of the
  caller?" directly. Empty IPC batches are valid Arrow and producers
  emit them; JDBC's next() only knows rows, so this cursor is the seam
  that translates batch-level signals into row-level advances.

- MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next /
  isClosed / getStatement / unwrap / isWrapperFor / getHoldability /
  getFetchSize / setFetchSize / getWarnings / getConcurrency / getType
  / getFetchDirection). Those test the StreamingResultSet plumbing
  shared by every result set on this branch and are already covered by
  StreamingResultSetMethodTest. Keep the arity-contract slice
  (short/long/right/null/empty rows) — that is the
  metadata-result-set-specific behavior.

Addresses: review comments on PR #175.
StreamingResultSet had two public factories — of(reader, allocator,
queryId[, zone]) (4 callers) and ofClosingOnFailure(Result, queryId,
zone) (5 callers). Every production caller wanted the close-on-failure
behavior; only tests and the metadata helper used the bare of(). Two
factories with overlapping responsibilities is one too many — a caller
hitting the bare of() and not knowing about ofClosingOnFailure would
silently leak the 100 MB RootAllocator on construction failure.

Collapse to one public factory:
- of(QueryResultArrowStream.Result, queryId, sessionZone) — the only
  factory callers see, always closes both reader and allocator on
  failure. Name is the unambiguous "of" because there is no other.
- create(reader, allocator, queryId, sessionZone) — private; just the
  construction body the factory wraps.

Production call sites (DataCloudConnection, DataCloudStatement) and
MetadataResultSets were already passing a (reader, allocator) pair,
so the call shape collapses to passing the Result holder. Tests that
were building the pair locally now wrap it in a Result the same way.
…r interface

Pre-unify there were three result-set implementations: StreamingResultSet
(streaming Arrow query results), DataCloudMetadataResultSet (metadata),
SimpleResultSet (in-memory rows). The DataCloudResultSet interface — a
one-method (getQueryId) extension over java.sql.ResultSet — was the
common "implements" the public API surfaced; StreamingResultSet was the
only one that ever implemented it as a non-trivial impl.

The unify refactor collapsed all three implementations into
StreamingResultSet, but kept the interface and the "Streaming" name. Two
problems fall out:

- The "Streaming" name now lies. Metadata results flow through the same
  class but they're a one-shot in-memory IPC blob — nothing streaming
  about them. MetadataResultSets.of even passes /*queryId=*/ null
  because there is no query.
- The DataCloudResultSet interface has one implementer and one method.
  Layering an interface for one impl is just a reader trap: callers
  instinctively look for "what other implementations exist" and find
  none.

Collapse the two:
- Rename the class StreamingResultSet -> DataCloudResultSet.
- Delete the old DataCloudResultSet interface (the public method
  getQueryId() now lives directly on the class via @Getter).
- Update all production and test references; rename the affected test
  files to match (StreamingResultSet*Test -> DataCloudResultSet*Test).

The public API surface is unchanged in source for the common cases:
DataCloudConnection.getRowBasedResultSet / getChunkBasedResultSet still
return DataCloudResultSet, just as a class instead of an interface.
This is binary-incompatible for any caller that ever cast to or
implemented the old interface; in practice only StreamingResultSet
implemented it on the read side, and no code outside the driver
implemented it on the write side.
@mkaufmann mkaufmann force-pushed the moritz/unify-result-sets branch from b83e8b7 to 7baf67d Compare May 15, 2026 10:19
@mkaufmann
Copy link
Copy Markdown
Member Author

Rebased on origin/main (now 7baf67d), force-pushed. :jdbc-core:test passes.

Walked the 10 inline findings against the rebased tree — all still apply. PRs #185 (zero-row batch skip) and #186 (getObject(Class) identity fallback) landed on main since the review and dropped out of this PR's diff cleanly, but neither touches any of the lines a finding anchors to. The IntegerVectorSetter range-check tightening called out in the summary is still on this branch (3bfd821).

GitHub now flags the inline comments as outdated because of the SHA rewrite. The findings themselves are unchanged — text and line anchors are still correct.


Generated by the review-pr-tavern skill — a human did not write this comment.

@mkaufmann mkaufmann changed the title refactor: unify ResultSet implementations on Arrow-backed path refactor!: unify ResultSet implementations on Arrow-backed path May 15, 2026
mkaufmann added 4 commits May 15, 2026 16:34
…torSetter

DatabaseMetaData.getTypeInfo declared CASE_SENSITIVE, UNSIGNED_ATTRIBUTE,
FIXED_PREC_SCALE, and AUTO_INCREMENT as VARCHAR while the row producer in
HyperTypes.buildTypeInfoRow wrote Boolean values into them. The mismatch
worked only because VarCharVectorSetter accepted Object and silently called
value.toString(), so the four columns surfaced as "true"/"false" strings
instead of the boolean payload JDBC 4.2 (DatabaseMetaData.getTypeInfo) and
pgjdbc both define for these positions.

Declare the four columns with a new bool(...) helper in MetadataSchemas
that produces a HyperType.bool(true) / Constants.BOOL ColumnMetadata. The
existing BitVectorSetter already accepts Boolean, so the row producer is
unchanged. Tighten VarCharVectorSetter from BaseVectorSetter<VarCharVector,
Object> to <VarCharVector, String> so non-String payloads fail fast at the
BaseVectorSetter type guard instead of being toString-coerced — the byte[]
arm was dead (setBytes / setBinaryStream / setUnicodeStream / setAsciiStream
all throw FEATURE_NOT_SUPPORTED in DataCloudPreparedStatement). Both fixes
land together because tightening the setter without the schema fix would
make getTypeInfo throw IllegalArgumentException on the Boolean payload.

Behavior change: getObject on the four columns now returns Boolean (per JDBC
spec), not String. Callers that previously cast (String) rs.getObject(...)
will get a ClassCastException; rs.getBoolean(...) starts working where it
previously threw on the VARCHAR path, and rs.getString(...) keeps returning
the same lowercase "true"/"false" via BooleanVectorAccessor.getString.

Pin the schema with three new MetadataSchemasTest methods mirroring the
existing COLUMNS coverage (names / typeNames / JdbcTypeIds), add a strict-
type regression test for VarCharVectorSetter modeled on
IntegerVectorSetterRangeCheckTest so a future re-widening trips CI, and
exercise rs.getBoolean on the four boolean columns end-to-end in
DataCloudDatabaseMetadataTest.testGetTypeInfo.
ArrowStreamReaderCursor.close used a plain try/finally that closed reader
first, allocator second. When both threw — the most likely failure mode
because the allocator's leak detector fires on close when buffers are still
outstanding, which is exactly what an exception during reader.close
produces — Java's finally semantics replaced the reader's exception with
the allocator's. The reader exception is the diagnostically interesting
one (the leak detector firing on allocator.close is usually a symptom);
silently dropping it left only the symptom in the stack trace.

Switch the cursor to try-with-resources over the (allocator, reader) pair
so reader closes first and the allocator's exception attaches as
suppressed onto the reader's instead of replacing it. Same fix on the
construction-failure cleanup in DataCloudResultSet.of: the reader.close
was already wrapped with addSuppressed but the immediately-following
allocator.close was bare and could replace the original construction
SQLException; wrap it the same way.

Pin the new behavior with a Mockito-based test that throws from both
reader.close and allocator.close and asserts the reader's exception is
primary with the allocator's attached as suppressed.
DataCloudStatement.executeQuery and getResultSet both fetched
iterator.getQueryStatus().getQueryId() twice — once when constructing
arrowStream and once when calling DataCloudResultSet.of with it. The
second call sat between arrowStream creation (which puts a 100 MB
RootAllocator on the field) and the of(...) call that takes ownership of
that allocator. If the second getQueryId() throws — e.g. a future
refactor makes getQueryStatus async, or a transient gRPC failure surfaces
through the cached proto — the allocator escapes both DataCloudResultSet.of's
own try/catch (never entered) and the surrounding catch
(StatusRuntimeException) (which doesn't close arrowStream). The PR
explicitly claims every code path closes its allocator; this hoist closes
the window without changing any observable behavior.
…fy cap

Both Result-holder construction sites — QueryResultArrowStream.toArrowStreamReader
and MetadataResultSets.of — built the RootAllocator first and then handed it
to a new ArrowStreamReader. If the reader's constructor throws (today
benign, but a future Arrow upgrade could add constructor-side validation),
the allocator escapes both DataCloudResultSet.of's own try/catch (never
entered) and the caller's catch. Wrap the construction in try/catch that
closes the allocator on the way out and attaches any close failure as
suppressed.

Unify the per-allocator cap: MetadataResultSets used Long.MAX_VALUE while
the gRPC path was capped at 100 MB. The cap exists because Arrow allocators
are accounted memory, so hitting the cap throws a clean OutOfMemoryException
instead of letting the JVM OOM. A getColumns(...) against a tenant with
thousands of tables silently bypassed the cap on the metadata path. Promote
the constant to public ROOT_ALLOCATOR_BUDGET_BYTES on QueryResultArrowStream
and reuse it from MetadataResultSets.

No new test: the failure mode requires ArrowStreamReader's constructor to
throw, which doesn't happen with ByteArrayInputStream or the gRPC channel
today. Pure code-shape fix; existing suite stays green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant