Geolookup

A Harper plugin that provides fast, tier-based reverse geocoding for the United States. Give it a latitude and longitude, and it will tell you where you are, down to the city, township, or county level. It won't judge you for being in New Jersey.

How It Works

Geolookup converts a lat/lon coordinate into an Uber H3 hexagonal cell index, then searches a pre-indexed table of cells to find the geographic locations that contain that point. The lookup walks from fine-grained resolution (H3 resolution 9, roughly a city block) up to coarse resolution (resolution 2, roughly a large region), checking for matches at each level until it finds results for all requested tiers.

The underlying geographic data is sourced from the US Census TIGER/Line shapefiles, converted to H3 hexagonal cells using compact cell representation.

Tiers

Geographic locations in the US exist at different levels of administrative hierarchy. Geolookup organizes these into three tiers:

Tier	Name	Census Entity	Coverage	Examples
1	`place`	Incorporated places and Census Designated Places (CDPs)	Partial. Only areas with an incorporated municipality or CDP designation. Rural and unincorporated areas often have no Tier 1 match.	"San Francisco", "Boise", "Chapel Hill"
2	`county_subdivision`	Minor Civil Divisions (MCDs) and Census County Divisions (CCDs)	Full national coverage. The Census Bureau ensures every square foot of the US falls within a county subdivision, creating statistical "unorganized territories" where no legal MCD exists.	"Springfield Township", "Falls Church city", "Northwest Arctic Borough"
3	`county`	Counties and county equivalents	Full national coverage. Every point in the US belongs to a county (or equivalent like a parish, borough, or independent city).	"Cook County", "Los Angeles County", "Orleans Parish"

The practical takeaway: if you query for Tier 1 in the middle of a national forest, you may get nothing back. Tier 2 and Tier 3 will always return a result for any valid US coordinate. Plan your tier selection accordingly.

Requesting Specific Tiers

The tiers query parameter controls which levels of geography are returned. You can request any combination:

Value	Behavior
`all` (default)	Returns all three tiers
`1`	Place only
`2`	County subdivision only
`3`	County only
`1,3`	Place and county
`1,2,3`	Same as `all`

If the tiers parameter is omitted, the service defaults to returning all tiers.

Data Model

The schema defines two tables in the geolookup database, designed to work together for efficient spatial lookups.

erDiagram
    Location {
        ID id PK
        Int tier
        String name
        String name_full
        String state_name
        String state_abbrev
        String h3_index
        String county_name
    }
    Cell {
        ID h3_index PK
        String tier_1 FK "place Location ID"
        String tier_2 FK "county_subdivision Location ID"
        String tier_3 FK "county Location ID"
    }
    DataLoadJob {
        ID id PK
        String state
        String status
        Int location_count
        Int cell_count
        String started_at
        String completed_at
        Int duration_ms
    }
    Location ||--o{ Cell : "place_cells (tier_1)"
    Location ||--o{ Cell : "county_subdivision_cells (tier_2)"
    Location ||--o{ Cell : "county_cells (tier_3)"

Location

The Location table stores geographic entities across all three tiers. Each record represents a single place, county subdivision, or county.

Key attributes:

Field	Description
`id`	Primary key
`tier`	Integer (1, 2, or 3) indicating the geographic level
`tier_label`	Human-readable tier name
`name`	Short name (e.g. "Springfield")
`name_full`	Full qualified name
`feature_type`	Census feature classification
`state_name` / `state_abbrev`	State information
`lat` / `lon`	Representative point for the location
`h3_index`	H3 cell index for the location's representative point
`country_code`	Country code (US)
`lsad`	Legal/Statistical Area Description code from Census
`county_name` / `county_fips`	Parent county info
`place_cells`	Relationship to Cell records (via `tier_1`)
`county_subdivision_cells`	Relationship to Cell records (via `tier_2`)
`county_cells`	Relationship to Cell records (via `tier_3`)

Cell

The Cell table is the spatial index. Each record represents a single H3 hexagonal cell and links to the Location(s) it belongs to at each tier.

Field	Description
`h3_index`	Primary key. The H3 cell index string.
`tier_1`	Location ID of the place this cell belongs to (if any)
`tier_2`	Location ID of the county subdivision this cell belongs to
`tier_3`	Location ID of the county this cell belongs to
`place`	Relationship to Location (from `tier_1`)
`county_subdivision`	Relationship to Location (from `tier_2`)
`county`	Relationship to Location (from `tier_3`)

A single cell can belong to locations at multiple tiers simultaneously. For example, one H3 cell might be in the city of Denver (Tier 1), an MCD (Tier 2), and Denver County (Tier 3). The @indexed directive on tier_1, tier_2, and tier_3 enables fast lookups from both directions: cell-to-location and location-to-cells.

H3 Compact Cells and Why They Matter

The raw TIGER/Line shapefiles define geographic boundaries as polygons. To make these searchable via H3, each polygon is filled with H3 cells. A naive approach would store every cell at the finest resolution, but that would produce an enormous number of records.

Instead, Geolookup uses H3's compact cell representation. The compactCells operation replaces any group of 7 child cells that share the same parent with that single parent cell. This is applied recursively, producing a mixed-resolution set of cells that covers the exact same area with far fewer records. The H3 documentation shows the magnitude of this: a compact representation can be an order of magnitude smaller than its uncompacted equivalent.

This is why the lookup algorithm searches across resolutions 9 down to 2, rather than at a single fixed resolution. A query point might match a fine-grained resolution-9 cell in a densely covered urban area, or it might only match a coarser resolution-4 cell in a rural area where compaction was more aggressive. The algorithm generates the H3 index for the query point at resolution 9, then computes its parent cell at each coarser resolution, and searches for any of those cells in the database. The first match found at each tier is the answer.

This design gives you the best of both worlds: precise coverage without a bloated cell table.

The Lookup Algorithm

flowchart TD
    A["GET /Geolookup?lat=...&lon=...&tiers=..."] --> B[Parse lat, lon, tiers]
    B --> C{Valid params?}
    C -- No --> D[Return error]
    C -- Yes --> E["Convert lat/lon to H3 cell at resolution 9"]
    E --> F["Compute parent cells at resolutions 8 down to 2"]
    F --> G["Search Cell table for any matching H3 index (OR)"]
    G --> H{Cell found?}
    H -- No more cells --> K[Return results collected so far]
    H -- Yes --> I["Extract Location for each requested tier"]
    I --> J{All requested tiers found?}
    J -- Yes --> K
    J -- No --> H

Here is what happens when a request hits the Geolookup endpoint:

Parse input. Extract lat, lon, and tiers from query parameters.
Generate H3 index. Convert the coordinate to an H3 cell at resolution 9 using latLngToCell().
Build candidate set. Compute the parent cell at each resolution from 8 down to 2 using cellToParent(). This produces 8 candidate cell indexes (resolutions 9, 8, 7, 6, 5, 4, 3, 2).
Search cells. Query the Cell table for any record matching one of the candidate H3 indexes. The query uses an OR condition across all candidates and includes relationship joins only for the requested tiers.
Collect results. As matching cells come back, extract the linked Location for each requested tier. Once all requested tiers have a result, stop early.
Return. Send back an object with keys for each requested tier (e.g. place, county_subdivision, county).

Plugin Configuration

Geolookup is designed to be used as a Harper plugin. The entry point is src/index.ts, which exports a handleApplication() function that Harper calls during startup.

flowchart LR
    A[Harper Startup] --> B["handleApplication(scope)"]
    B --> Z["configureDataLoad({ dataVersion, dataBaseUrl })"]
    Z --> C{exposeGeoService?}
    C -- Yes --> D["Register Geolookup at /{geoServiceName}"]
    C -- No --> E[Skip]
    Z --> F{exposeDataLoadService?}
    F -- Yes --> G["Register DataLoad at /{dataLoadServiceName}"]
    F -- No --> H[Skip]
    D --> I[Ready]
    E --> I
    G --> I
    H --> I

Installing as a Plugin

Geolookup is published on npm as geolookup-plugin. To use it in your Harper application, first install the package:

npm install geolookup-plugin

Then, in the consuming application's config.yaml, reference the Geolookup component and provide configuration options:

'geolookup-plugin':
  package: 'geolookup-plugin'
  exposeGeoService: true
  geoServiceName: 'geo'
  exposeDataLoadService: true
  dataLoadServiceName: 'dataload'
  # Optional — override the data revision tag (default baked into the plugin):
  # dataVersion: 'data-2026.05'
  # Optional — override the archive base URL (defaults to GitHub Releases):
  # dataBaseUrl: 'https://github.com/kylebernhardy/geolookup/releases/download'

Configuration Options

Option	Type	Default	Description
`exposeGeoService`	`boolean`	`false`	When `true`, the Geolookup resource is registered and accessible via REST at the path specified by `geoServiceName`. When `false` or omitted, the plugin loads but does not expose a lookup endpoint. Useful if you want to import and use the `Geolookup` class programmatically without a public-facing REST route.
`geoServiceName`	`string`	-	The name under which the Geolookup resource is registered. This becomes the URL path segment for the endpoint (e.g. setting it to `"geo"` exposes the service at `/geo`). Required when `exposeGeoService` is `true`.
`exposeDataLoadService`	`boolean`	`false`	When `true`, the DataLoad resource is registered and accessible via REST at the path specified by `dataLoadServiceName`. Provides a bulk data loading endpoint that fetches state archives from GitHub Releases on demand and loads them into the `Location` and `Cell` tables.
`dataLoadServiceName`	`string`	-	The name under which the DataLoad resource is registered. This becomes the URL path segment for the endpoint (e.g. setting it to `"dataload"` exposes the service at `/dataload`). Required when `exposeDataLoadService` is `true`.
`dataVersion`	`string`	Plugin-baked default (e.g. `"data-2026.05"`)	Override the data revision tag the loader fetches at runtime. Set this in a consuming app's `config.yaml` to pin a specific dataset, or to roll forward to a newer one without upgrading the plugin's npm version.
`dataBaseUrl`	`string`	`"https://github.com/kylebernhardy/geolookup/releases/download"`	Override the base URL the loader fetches archives from. Useful for mirrors, internal proxies, air-gapped environments, or pointing tests at a local fake server.
`autoLoadStates`	`string[] \| "all"`	(off)	Auto-load these states on Harper startup, in the background, fire-and-forget. Pass an array of state names (case-insensitive) or `"all"` for every published state. Idempotent — already-loaded states are skipped. See Auto-load on startup.

Exports

The plugin module exports the following classes, types, and functions:

Export	Kind	Description
`Geolookup`	Class	Reverse geocoding resource. Can be used programmatically or registered as a REST endpoint.
`DataLoad`	Class	Bulk data loading resource for populating Location and Cell tables.
`Location`	Type	TypeScript interface for Location records (places, county subdivisions, counties).
`Cell`	Type	TypeScript interface for Cell records (H3 spatial index entries).
`RequestTarget`	Type	Re-export of Harper's `RequestTarget` type for typing resource method parameters.
`handleApplication`	Function	Plugin entry point called by Harper during startup. Typically not imported directly.

import { Geolookup, DataLoad } from 'geolookup-plugin';
import type { Location, Cell } from 'geolookup-plugin';

API Usage

Basic Lookup (All Tiers)

curl "http://localhost:9926/Geolookup?lat=40.7128&lon=-74.0060"

Response:

{
  "place": {
    "id": "...",
    "tier": 1,
    "name": "New York",
    "name_full": "New York city",
    "state_name": "New York",
    "state_abbrev": "NY",
    "h3_index": "...",
    "country_code": "US",
    "county_name": "New York"
  },
  "county_subdivision": {
    "id": "...",
    "tier": 2,
    "name": "Manhattan",
    "name_full": "Manhattan borough",
    "state_name": "New York",
    "state_abbrev": "NY",
    "h3_index": "...",
    "country_code": "US",
    "county_name": "New York"
  },
  "county": {
    "id": "...",
    "tier": 3,
    "name": "New York County",
    "name_full": "New York County",
    "state_name": "New York",
    "state_abbrev": "NY",
    "h3_index": "...",
    "country_code": "US",
    "county_name": "New York"
  }
}

Single Tier

curl "http://localhost:9926/Geolookup?lat=40.7128&lon=-74.0060&tiers=3"

Response:

{
  "county": {
    "id": "...",
    "tier": 3,
    "name": "New York County",
    ...
  }
}

Multiple Tiers

curl "http://localhost:9926/Geolookup?lat=40.7128&lon=-74.0060&tiers=1,3"

Response includes only place and county (no county_subdivision).

Query Parameters

Parameter	Required	Description
`lat`	Yes	Latitude (decimal degrees)
`lon`	Yes	Longitude (decimal degrees)
`tiers`	No	Comma-separated tier numbers (`1`, `2`, `3`) or `all`. Defaults to `all`.

Data Loading

Geographic data is not bundled with the plugin. On demand, the DataLoad endpoint creates a tracking job and immediately returns the job ID. The background worker downloads the requested state's .tar.gz from the geolookup GitHub Releases into an OS temp directory, extracts it with node-tar, and loads Location and Cell records into Harper. The temp directory is removed when the job finishes (success or error). Progress is tracked in the DataLoadJob table, which is exported and can be queried directly at any time.

Configuration

The loader uses defaults baked into the plugin for both the data revision tag and the archive base URL. Override either via plugin config (see dataVersion / dataBaseUrl). For example, to pin a specific data release in a consuming app's config.yaml:

'geolookup-plugin':
  exposeDataLoadService: true
  dataLoadServiceName: 'dataload'
  dataVersion: 'data-2026.05'

DataLoad Endpoint

Important: The DataLoad endpoint is intended for initial data seeding only. Once the states you need are loaded, set exposeDataLoadService to false in your config.yaml to remove the route. Your geocoding lookups don't need it during normal operation, and disabling it reduces your app's exposed surface area.

GET /DataLoad?state={state}

Initiates a data load for the given state. The state parameter is case-insensitive (it is lowercased internally).

curl "http://localhost:9926/DataLoad?state=Wyoming"

Response (returns immediately):

{
  "jobId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

The returned jobId can be used to check progress via the DataLoadJob endpoint.

If the state query param is missing, the endpoint returns { "error": "state query parameter is required" } synchronously. Any other failure (download 404, extraction error, load failure) surfaces asynchronously: poll DataLoadJob/<jobId> and read status (which becomes "error") and error_message.

What happens in the background

Downloading — The .tar.gz for the requested state is fetched from ${dataBaseUrl}/${dataVersion}/${state}.tar.gz (defaults to GitHub Releases on the geolookup repo) and streamed to a per-job temp directory under os.tmpdir().
Extracting — The archive is extracted into the same temp directory using node-tar (pure JS — no system tar CLI required).
Loading locations — All JSON files under {state}/Location/ in the extracted tree are loaded into the Location table. The job's location_count is updated after each file.
Loading cells — All JSON files under {state}/Cell/ are loaded into the Cell table. The job's cell_count is updated after each file.
Cleanup — The temp directory is deleted (success or failure).

All database writes within each file are wrapped in a Harper transaction for performance.

DataLoad Query Parameters

Parameter	Required	Description
`state`	Yes	Name of the state or territory to load (case-insensitive). Must match a `.tar.gz` file in the `data/` directory.

DataLoadJob Endpoint

GET /DataLoadJob/{jobId}

The DataLoadJob table is exported as a REST resource, so you can query it directly to check the status of a data load job using the jobId returned by the DataLoad endpoint.

curl "http://localhost:9926/DataLoadJob/a1b2c3d4-e5f6-7890-abcd-ef1234567890"

Response:

{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "state": "wyoming",
  "status": "completed",
  "error_message": null,
  "location_count": 152,
  "cell_count": 8432,
  "started_at": "2026-03-06T12:00:00.000Z",
  "completed_at": "2026-03-06T12:00:45.000Z",
  "duration_ms": 45000
}

You can also list all jobs:

curl "http://localhost:9926/DataLoadJob"

Filtering

The state and status columns are @indexed, so you can filter directly via query params — useful when auto-loading on startup (which doesn't surface jobIds to the caller) or when watching a batch in flight:

# Every job for a specific state (newest first)
curl "http://localhost:9926/DataLoadJob?state=dc"

# Currently in-flight loads
curl "http://localhost:9926/DataLoadJob?status=downloading"
curl "http://localhost:9926/DataLoadJob?status=extracting"
curl "http://localhost:9926/DataLoadJob?status=loading_locations"
curl "http://localhost:9926/DataLoadJob?status=loading_cells"

# Anything that errored
curl "http://localhost:9926/DataLoadJob?status=error"

Filters combine — ?state=dc&status=completed returns just the completed DC loads (the existence of one of these is exactly what autoLoadStates's idempotency check tests for).

DataLoadJob Fields

Field	Description
`id`	Job ID (UUID)
`state`	The state being loaded
`status`	Current job status (see table below)
`error_message`	Error details if status is `error`, otherwise `null`
`location_count`	Number of `Location` records loaded so far
`cell_count`	Number of `Cell` records loaded so far
`started_at`	ISO 8601 timestamp when the job was created
`completed_at`	ISO 8601 timestamp when the job finished (or errored)
`duration_ms`	Total elapsed time in milliseconds

Job Statuses

stateDiagram-v2
    [*] --> pending : Job created
    pending --> downloading : Background processing starts
    downloading --> extracting : Archive downloaded
    extracting --> loading_locations : Archive extracted
    loading_locations --> loading_cells : All Location files loaded
    loading_cells --> completed : All Cell files loaded
    downloading --> error : Download fails
    extracting --> error : Extraction fails
    loading_locations --> error : Loading fails
    loading_cells --> error : Loading fails
    completed --> [*]
    error --> [*]

Status	Description
`pending`	Job created, processing has not started
`downloading`	Fetching the `.tar.gz` archive from the configured base URL
`extracting`	Extracting the `.tar.gz` archive
`loading_locations`	Loading records into the `Location` table
`loading_cells`	Loading records into the `Cell` table
`completed`	All data loaded successfully
`error`	An error occurred (see `error_message`)

Available States and Territories

The following states and territories have data archives published in the geolookup GitHub Releases. Pass any of these names (case-insensitive) as the state query parameter:

States
alabama	alaska	arizona	arkansas
california	colorado	connecticut	delaware
florida	georgia	hawaii	idaho
illinois	indiana	iowa	kansas
kentucky	louisiana	maine	maryland
massachusetts	michigan	minnesota	mississippi
missouri	montana	nebraska	nevada
new hampshire	new jersey	new mexico	new york
north carolina	north dakota	ohio	oklahoma
oregon	pennsylvania	rhode island	south carolina
south dakota	tennessee	texas	utah
vermont	virginia	washington	west virginia
wisconsin	wyoming

Territories
american samoa	cnmi
dc	guam
puerto rico	usvi

Auto-load on startup

For dev environments, fresh clones, ephemeral CI/test instances, etc. — instead of exposing the DataLoad HTTP endpoint and orchestrating curl+poll cycles, set autoLoadStates and the plugin will populate the tables on every Harper startup.

'geolookup-plugin':
  package: 'geolookup-plugin'
  exposeGeoService: true
  geoServiceName: 'geo'
  # No need to expose DataLoad — autoLoad runs server-side at boot:
  exposeDataLoadService: false
  # Pick specific states for dev:
  autoLoadStates: ['dc', 'colorado']
  # ...or load everything (56 states/territories, ~40 MB total):
  # autoLoadStates: 'all'

Semantics:

Runs in the background after handleApplication finishes — does not block Harper startup.
Idempotent across restarts: each state is loaded only if no DataLoadJob record exists with status: 'completed' for it. Subsequent restarts skip already-loaded states.
Errors land in the per-state DataLoadJob record (status: 'error', error_message: ...) and do not crash boot. A bad state name, a 404 from GitHub Releases, or a transient network failure all behave the same way.
'all' resolves to every state/territory in the plugin's built-in list (50 states + DC + American Samoa, CNMI, Guam, Puerto Rico, USVI).
Logged at info on Harper's logger: one summary line on boot listing every state queued, plus one kicked off (jobId ...) line per state as the background work spawns. Skipped/already-loaded states log at debug. Tail ~/harper/log/hdb.log to confirm autoload ran.

Checking progress without the jobId. Because autoload doesn't return jobIds to a caller, use the DataLoadJob REST resource's filtering to inspect status:

# Is DC done?
curl "http://localhost:9926/DataLoadJob?state=dc&status=completed"

# What's currently in flight?
curl "http://localhost:9926/DataLoadJob?status=downloading"
curl "http://localhost:9926/DataLoadJob?status=loading_locations"

# Any errors during this boot's autoload?
curl "http://localhost:9926/DataLoadJob?status=error"

Force-refresh / re-loading: out of scope today. To re-load a state, delete its DataLoadJob records via the REST endpoint, then restart.

Publishing a New Data Release

Maintainers cut a new data release after refreshing data/:

npm run data:publish -- data-YYYY.MM

This uploads every data/*.tar.gz to a data-YYYY.MM GitHub Release via the gh CLI. After publishing, bump DEFAULT_DATA_VERSION in src/dataConfig.ts, run npm test and npm run build, then cut a new npm release of the plugin so consumers pick up the new default. Consumers can also pin to a specific data release via the dataVersion config option without upgrading the plugin.

Development

Prerequisites

Node.js v24+ (see .nvmrc)
Harper installed globally: npm install -g harper

Setup

After cloning the repo, install the agent skills used for development:

npm run agent:skills:update

Running Locally

npm run dev

This starts the Harper dev server at http://localhost:9926.

Testing

npm test                              # run all tests
npm run test:watch                    # run tests in watch mode
node --test test/geolookup.test.js    # run a single test file

Tests use the Node.js built-in test runner (node:test) and node:assert/strict.

Linting and Formatting

npm run lint       # ESLint
npm run format     # Prettier

Deployment

Configure your .env file with your Harper Fabric cluster credentials (see .env.example), then:

npm run deploy

A GitHub Actions workflow (.github/workflow/deploy.yaml) is also included for CI/CD deployment via workflow_dispatch.

Example Application

For a working implementation of this plugin, see geolookup-example. It's a minimal Harper application that wires up the geocoding and data loading endpoints via config.yaml, and includes an interactive CLI for bulk loading state data with real-time progress tracking.

Links

geolookup-example — Reference implementation with interactive data loading CLI
Harper Documentation
Harper Fabric
Harper Components Reference
Harper Resource Class Reference
H3: Uber's Hexagonal Hierarchical Spatial Index
US Census TIGER/Line Shapefiles

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github		.github
data		data
docs/superpowers/plans		docs/superpowers/plans
schemas		schemas
scripts		scripts
skills		skills
src		src
test		test
.env.example		.env.example
.gitignore		.gitignore
.nvmrc		.nvmrc
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
eslint.config.js		eslint.config.js
graphql.config.yml		graphql.config.yml
package-lock.json		package-lock.json
package.json		package.json
skills-lock.json		skills-lock.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

Geolookup

How It Works

Tiers

Requesting Specific Tiers

Data Model

Location

Cell

H3 Compact Cells and Why They Matter

The Lookup Algorithm

Plugin Configuration

Installing as a Plugin

Configuration Options

Exports

API Usage

Basic Lookup (All Tiers)

Single Tier

Multiple Tiers

Query Parameters

Data Loading

Configuration

DataLoad Endpoint

What happens in the background

DataLoad Query Parameters

DataLoadJob Endpoint

Filtering

DataLoadJob Fields

Job Statuses

Available States and Territories

Auto-load on startup

Publishing a New Data Release

Development

Prerequisites

Setup

Running Locally

Testing

Linting and Formatting

Deployment

Example Application

Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages