This is a tracking issue for potentially making improvements to our default compressor (which is just all of our encodings/schemes + our sampling compressor).
Design
We recently made several large structural changes to the compressor (tracked in #7216), which both made the sampling compressor pluggable and also made observability into what the compressor is doing easier (perfetto tracing and a clearer sampling algorithm).
Those improvements gave a few insights in different ways we could optimize the compressor, but it is unclear the degree to which those improvements would help.
Below are a list of ideas that I have that I think could be improvements to the compressor, in no particular order.
Steps
Unresolved questions
Implementation history
This is a tracking issue for potentially making improvements to our default compressor (which is just all of our encodings/schemes + our sampling compressor).
Design
We recently made several large structural changes to the compressor (tracked in #7216), which both made the sampling compressor pluggable and also made observability into what the compressor is doing easier (perfetto tracing and a clearer sampling algorithm).
Those improvements gave a few insights in different ways we could optimize the compressor, but it is unclear the degree to which those improvements would help.
Below are a list of ideas that I have that I think could be improvements to the compressor, in no particular order.
Steps
ConstantSchemelogic into the compressorDictSchemelogic into the compressor? (likely more controversial...)SequenceScheme, but then we throw away the entire array and recompress later. That can definitely be fixed.MAX_CASCADE = 3with adaptive cascading that leverages the exclusion-bounded search tree (this is a stretch goal as the search space is potentially exponential)Unresolved questions
MAX_CASCADEbe replaced with an adaptive search strategy now that the exclusion system bounds the search space?Implementation history