GitHub - gojiplus/bloomjoin: bloomjoin: An R package implementing Bloom filter-based joins for improved performance with large datasets.

bloomjoin

Faster, memory-efficient joins when joining a large table to a small lookup table.

When to use

bloomjoin helps when:

Large table joined to small table (10:1 ratio or more)
Low overlap between join keys (<25%)

n_x	n_y	overlap	speed	memory
1,000,000	10,000	1%	2.0x	2.2x
1,000,000	10,000	5%	1.6x	2.0x
500,000	5,000	2%	1.7x	1.9x
200,000	20,000	5%	1.2x	1.2x

Values > 1 mean bloomjoin is faster / uses less memory than dplyr.

Installation

devtools::install_github("gojiplus/bloomjoin")

Usage

library(bloomjoin)

result <- bloom_join(large_df, small_lookup, by = "id")

Same syntax as dplyr. Supports type = "inner", "left", "right", "semi", "anti".

When NOT to use

Similar-sized tables: dplyr is faster
High overlap (>50%): no benefit from pre-filtering

n_x	n_y	overlap	speed	memory
100,000	100,000	10%	0.4x	0.5x
100,000	100,000	50%	0.4x	0.4x

Values < 1 mean dplyr is faster.

How it works

Build a Bloom filter from the smaller table's keys
Pre-filter the larger table to remove non-matching rows
Run the actual join on the reduced dataset

Bloom filters have no false negatives, so no matches are lost.

Documentation

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.github/workflows		.github/workflows
R		R
docs		docs
inst		inst
man		man
src		src
tests		tests
vignettes		vignettes
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CRAN-SUBMISSION		CRAN-SUBMISSION
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
_pkgdown.yml		_pkgdown.yml
bloomjoin.Rproj		bloomjoin.Rproj
cran-comments.md		cran-comments.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bloomjoin

When to use

Installation

Usage

When NOT to use

How it works

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

bloomjoin

When to use

Installation

Usage

When NOT to use

How it works

Documentation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages