refactor!: unify ResultSet implementations on Arrow-backed path by mkaufmann · Pull Request #175 · forcedotcom/datacloud-jdbc

mkaufmann · 2026-04-24T19:05:38Z

NOTE: I'm not happy with the PR and the approach, this is for me to make reviews and steer the agent

Summary

Collapse the two ResultSet families (streaming Arrow + row-based metadata) into a single Arrow-backed implementation so there is one accessor pipeline, one set of type semantics, and one place to fix bugs. Also tightens root-allocator hygiene.

Built on top of #moritz/centralize-types-via-hypertype.

What changed

Unified result set. DataCloudMetadataResultSet, SimpleResultSet, and ColumnAccessor are removed. JDBC metadata calls (getTables, getColumns, getSchemas, getTypeInfo, and all empty-metadata helpers) now funnel through StreamingResultSet via a new MetadataArrowBuilder that materialises List<List<Object>> metadata rows into a populated Arrow VectorSchemaRoot. MetadataResultSets is the factory callers use.

Source-agnostic cursor. ArrowStreamReaderCursor now accepts either a streaming ArrowStreamReader or a pre-populated in-memory VectorSchemaRoot, driven by a pluggable BatchLoader. The cursor owns an AutoCloseable holding the backing resources and closes it on cursor close.

Root allocator hygiene.

QueryResultArrowStream.toArrowStreamReader previously leaked a 100 MB RootAllocator — ArrowStreamReader.close() only tears down vectors, not the allocator. It now returns a Result holder that pairs the reader with the allocator and closes both in the correct order (reader first, so ArrowBuf accounting clears before the allocator's budget check).
StreamingResultSet.ofInMemory(root, owned, queryId, zone, cols) similarly takes ownership of the allocator + VSR through an AutoCloseable, so every code path closes its allocator.

typeName preservation. ofInMemory accepts an optional columns override so JDBC-spec labels like \"TEXT\" / \"INTEGER\" / \"SHORT\" survive a round-trip through Arrow (the derived HyperType names would otherwise be \"VARCHAR\" etc.).

StreamingResultSet.getObject(int, Class) gains an isInstance fallback so getObject(col, String.class) on a VARCHAR works without each accessor implementing typed getObject.

Behavior changes worth calling out

Accessor semantics on metadata rows are now the same as on query results, which is stricter than the old row-based SimpleResultSet:

getBoolean / getDate / getTime / getTimestamp on an integer column throw SQLException instead of loose-coercing.
getByte on an integer column is now supported (previously threw in the metadata path).

DataCloudDatabaseMetadataTest assertions were updated accordingly.

Test plan

./gradlew clean build passes (includes spotlessCheck, all tests, JaCoCo coverage, verification).
./gradlew :jdbc-core:test — 1222 tests, 0 failed.
Spot-check downstream: Spark datasource still compiles (covered by full build).

Breaking changes

com.salesforce.datacloud.jdbc.core.DataCloudResultSet flips from public interface to public class. External code that wrote class MyRs implements DataCloudResultSet (decorators, wrappers, hand-rolled doubles) no longer compiles. Code that only consumes the standard java.sql.ResultSet / DataCloudResultSet API as an opaque type recompiles unchanged.

The following previously-public types are removed: StreamingResultSet, DataCloudMetadataResultSet, SimpleResultSet, ColumnAccessor. External callers of StreamingResultSet.of(ArrowStreamReader, ...) should switch to DataCloudResultSet.of(QueryResultArrowStream.Result, ...).

JDBC metadata accessor semantics on integer columns are stricter: getDate / getTime / getTimestamp now throw SQLException (previously UnsupportedOperationException). getObject(col, Boolean.class) on integer columns now throws.

BREAKING CHANGE: DataCloudResultSet is now a class instead of an interface; StreamingResultSet, DataCloudMetadataResultSet, SimpleResultSet, ColumnAccessor are removed; metadata int-column accessors throw on date/time/Boolean conversions.

…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).

codecov · 2026-05-11T13:44:21Z

Codecov Report

❌ Patch coverage is 74.38692% with 94 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.49%. Comparing base (f47714f) to head (dd484f2).

Files with missing lines	Patch %	Lines
...sforce/datacloud/jdbc/core/DataCloudResultSet.java	75.63%	47 Missing and 1 partial ⚠️
...tacloud/jdbc/core/metadata/MetadataResultSets.java	65.45%	15 Missing and 4 partials ⚠️
.../datacloud/jdbc/protocol/data/VectorPopulator.java	73.84%	12 Missing and 5 partials ⚠️
...atacloud/jdbc/protocol/QueryResultArrowStream.java	25.00%	6 Missing ⚠️
...datacloud/jdbc/protocol/data/HyperTypeToArrow.java	81.81%	1 Missing and 1 partial ⚠️
...e/datacloud/jdbc/core/ArrowStreamReaderCursor.java	80.00%	0 Missing and 1 partial ⚠️
...oud/jdbc/protocol/data/ArrowToHyperTypeMapper.java	75.00%	0 Missing and 1 partial ⚠️

❌ Your patch check has failed because the patch coverage (74.38%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff              @@
##               main     #175      +/-   ##
============================================
- Coverage     82.37%   80.49%   -1.88%     
+ Complexity     1871     1707     -164     
============================================
  Files           125      123       -2     
  Lines          5020     4953      -67     
  Branches        540      521      -19     
============================================
- Hits           4135     3987     -148     
- Misses          642      737      +95     
+ Partials        243      229      -14

Components	Coverage Δ
JDBC Core	`80.84% <74.38%> (-2.31%)`	⬇️
JDBC Main	`40.69% <ø> (ø)`
JDBC HTTP	`90.30% <ø> (ø)`
JDBC Utilities	`65.25% <ø> (ø)`
Spark Datasource	`∅ <ø> (∅)`

Files with missing lines	Coverage Δ
...force/datacloud/jdbc/core/DataCloudConnection.java	`57.04% <100.00%> (ø)`
...datacloud/jdbc/core/DataCloudDatabaseMetadata.java	`98.34% <100.00%> (-0.01%)`	⬇️
...sforce/datacloud/jdbc/core/DataCloudStatement.java	`80.90% <100.00%> (-0.20%)`	⬇️
...alesforce/datacloud/jdbc/core/MetadataSchemas.java	`96.87% <100.00%> (+0.04%)`	⬆️
...esforce/datacloud/jdbc/core/QueryMetadataUtil.java	`95.40% <100.00%> (ø)`
...oud/jdbc/core/SQLExceptionQueryResultIterator.java	`75.00% <ø> (ø)`
.../com/salesforce/datacloud/jdbc/util/Constants.java	`0.00% <ø> (ø)`
...e/datacloud/jdbc/core/ArrowStreamReaderCursor.java	`88.57% <80.00%> (-2.06%)`	⬇️
...oud/jdbc/protocol/data/ArrowToHyperTypeMapper.java	`67.79% <75.00%> (+5.97%)`	⬆️
...datacloud/jdbc/protocol/data/HyperTypeToArrow.java	`76.92% <81.81%> (-1.34%)`	⬇️
... and 4 more

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).

Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.

Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.

…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).

Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.

Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.

mkaufmann · 2026-05-11T17:06:39Z

Per the two review threads, split out the cherry-pickable fixes as their own PRs against main:

chore: skip zero-row batches in ArrowStreamReaderCursor #185 — fix: skip zero-row batches in ArrowStreamReaderCursor (commit 2088116 from this branch, reshaped against main's 2-arg cursor signature). Self-contained: only ArrowStreamReaderCursor.java + ArrowStreamReaderCursorTest.java.
fix: support getObject(Class) with identity class type in QueryJDBCAccessor #186 — fix: provide default getObject(Class) fallback in QueryJDBCAccessor (commit 093d692 from this branch, applies cleanly on main). Self-contained: only QueryJDBCAccessor.java + the bundled StreamingResultSetMethodTest.getObjectWithClassUsesAccessorBaseFallback regression test.

This PR (#175) keeps the same fixes as the first two commits — when #185 / #186 land, those commits will collapse to no-ops at rebase time.

For the remaining "should QueryResultArrowStream allocator-ownership move pre-unify too?" thread (#175 review): waiting on your call before I do that split. As I noted there, it's a non-trivial surgery on the unify commit and I'd rather get your sign-off before rewriting ~800 lines of refactor.