Skip to content

[0.19] Address caching issue in schema compiler that can lead to OOMs#1329

Open
Baccata wants to merge 5 commits into
series/0.19from
address-caching-issue-in-schema-compiler
Open

[0.19] Address caching issue in schema compiler that can lead to OOMs#1329
Baccata wants to merge 5 commits into
series/0.19from
address-caching-issue-in-schema-compiler

Conversation

@Baccata

@Baccata Baccata commented Dec 18, 2023

Copy link
Copy Markdown
Contributor

Closes #1328

PR Checklist (not all items are relevant to all PRs)

  • Added unit-tests (for runtime code)
  • Updated changelog

The CachedSchemaCompiler.Impl class suffered from a pretty dire problem
of only caching what is passed to the schema visitor. However, schemas
are often pre-processed before reaching the schema visitors, which,
combined with a lack of guarantee on Schema equality (in particular for
Enumeration Schemas in 0.18), and global caches posed a risk of OOMs in
some cases.

This change makes it so that the initial schema passed to the compiler
acts as a cache key to a cache that's separate to the one that is
typically passed to schema visitors.
@Baccata Baccata changed the title [0.19] Address caching issue in schema compiler [0.19] Address caching issue in schema compiler that can lead to OOMs Dec 18, 2023
protected type Aux[_]
type Cache = CompilationCache[Aux]
type AuxCache = CompilationCache[Aux]
case class Cache(outer: CompilationCache[F], inner: AuxCache)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the core of the change : we need an inner cache to be passed to some schema visitor, but we also need an outer cache to prevent schema-preprocessing from being re-applied

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: is it only a problem because the preprocessing doesn't produce deterministic (as per hashCode/equals) outputs?

Do we know which transformation was causing the particular issue?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's 2 problems : most pernicious one is what you describe, and it's the total function held by enumeration schemas which is at fault. Hence the changes in my other PR.

Less pernicious one is that the it's the input of the schema visitor call that gets cached instead of the input of the schema compiler.

This means that we're not protecting against re-running the inefficient pre-processing of schemas that may occur (like hint masks), which is really bad performance wise

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good info, thanks

@daddykotex daddykotex added this to the 0.19.0 milestone Jan 15, 2024
@kubukoz kubukoz marked this pull request as ready for review January 17, 2024 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants