[0.19] Address caching issue in schema compiler that can lead to OOMs#1329
[0.19] Address caching issue in schema compiler that can lead to OOMs#1329Baccata wants to merge 5 commits into
Conversation
The CachedSchemaCompiler.Impl class suffered from a pretty dire problem of only caching what is passed to the schema visitor. However, schemas are often pre-processed before reaching the schema visitors, which, combined with a lack of guarantee on Schema equality (in particular for Enumeration Schemas in 0.18), and global caches posed a risk of OOMs in some cases. This change makes it so that the initial schema passed to the compiler acts as a cache key to a cache that's separate to the one that is typically passed to schema visitors.
| protected type Aux[_] | ||
| type Cache = CompilationCache[Aux] | ||
| type AuxCache = CompilationCache[Aux] | ||
| case class Cache(outer: CompilationCache[F], inner: AuxCache) |
There was a problem hiding this comment.
this is the core of the change : we need an inner cache to be passed to some schema visitor, but we also need an outer cache to prevent schema-preprocessing from being re-applied
There was a problem hiding this comment.
question: is it only a problem because the preprocessing doesn't produce deterministic (as per hashCode/equals) outputs?
Do we know which transformation was causing the particular issue?
There was a problem hiding this comment.
There's 2 problems : most pernicious one is what you describe, and it's the total function held by enumeration schemas which is at fault. Hence the changes in my other PR.
Less pernicious one is that the it's the input of the schema visitor call that gets cached instead of the input of the schema compiler.
This means that we're not protecting against re-running the inefficient pre-processing of schemas that may occur (like hint masks), which is really bad performance wise
…g-issue-in-schema-compiler
Closes #1328
PR Checklist (not all items are relevant to all PRs)