Skip to content

Fix DD_APM_TRACING_ENABLED to work with LLMObs#10989

Open
matsumo-and wants to merge 8 commits intoDataDog:masterfrom
matsumo-and:fix/LLMOBS-10051
Open

Fix DD_APM_TRACING_ENABLED to work with LLMObs#10989
matsumo-and wants to merge 8 commits intoDataDog:masterfrom
matsumo-and:fix/LLMOBS-10051

Conversation

@matsumo-and
Copy link
Copy Markdown

@matsumo-and matsumo-and commented Mar 28, 2026

What Does This Do

This PR fixes two issues when using LLM Observability without APM tracing:

  1. Makes DD_APM_TRACING_ENABLED=false properly drop APM traces while keeping LLMObs traces
  2. Prevents NullPointerException when DD_TRACE_ENABLED=false is set with LLMObs enabled

Motivation

Solves #10051

Currently, when DD_APM_TRACING_ENABLED=false is set with DD_LLMOBS_ENABLED=true, APM traces are still sent to the agent because the sampler selection logic falls through to the default sampler, which keeps all traces by default.

Additionally, setting DD_TRACE_ENABLED=false causes NPE in LLMObs methods because LLMObs tries to initialize even when all tracing is disabled.

Additional Notes

Implementation approach:

  1. ProductTraceSource.LLMOBS flag (0x20): Added following the existing pattern for ASM/DSM/DBM products
  2. StandaloneProduct enum: encapsulates per-product behavior (trace source bit, sampling mechanism, whether a billing trace is needed). Currently LLMOBS and ASM.
  3. StandaloneSampler: unified sampler for when APM tracing is disabled but one or more standalone products are active.
  4. DDLLMObsSpan marking: Uses TraceSegment.setTagTop() to propagate the LLMOBS flag from child spans to the root span, following the exact same pattern used by ASM (AppSecEventTracker, GatewayBridge, IAST Reporter)
  5. NPE prevention: Added early return in LLMObsSystem.start() when !config.isTraceEnabled()

Testing:

  • Added smoke tests in dd-smoke-tests/apm-tracing-disabled/
  • Verified on master: LLMObs works but APM traces not dropped
  • Verified on fix branch: LLMObs works and APM traces properly dropped

Contributor Checklist

Note: Once your PR is ready to merge, add it to the merge queue by commenting /merge. /merge -c cancels the queue request. /merge -f --reason "reason" skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.

Signed-off-by: matsumo-and <yh134.toisanda@gmail.com>
Fixes DataDog#10051

When DD_APM_TRACING_ENABLED=false is set, APM tracing should be disabled
while allowing other products like LLM Observability to function. Previously,
setting DD_APM_TRACING_ENABLED=false would inadvertently disable LLMObs or
allow APM traces to leak through when no other products were enabled.

Signed-off-by: matsumo-and <yh134.toisanda@gmail.com>
Signed-off-by: matsumo-and <yh134.toisanda@gmail.com>
@matsumo-and matsumo-and marked this pull request as ready for review March 28, 2026 10:13
@matsumo-and matsumo-and requested review from a team as code owners March 28, 2026 10:13
@bric3 bric3 requested a review from smola April 2, 2026 08:36
@matsumo-and
Copy link
Copy Markdown
Author

Hi @smola , just a gentle ping.
I'd appreciate your review when you have time.
Thanks!

Comment thread dd-trace-core/src/main/java/datadog/trace/common/sampling/Sampler.java Outdated
if (!config.isApmTracingEnabled()) {
if (config.isLlmObsEnabled()) {
log.debug("APM is disabled, but LLMObs is enabled. All LLMObs traces will be kept.");
return new LlmObsStandaloneSampler();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears this would break ASM. If ASM is enabled, we still need to pass, at least, 1 APM/ASM per minute.

Copy link
Copy Markdown
Author

@matsumo-and matsumo-and Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smola

Thanks for pointing this out!
When both are enabled, LlmObsStandaloneSampler was returned and the ASM branch was never reached.

Fixed by adding LlmObsAndAsmStandaloneSampler that keeps all LLMObs/ASM traces while still allowing 1 APM trace per minute for billing.

69107cb

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asking @DataDog/asm-java for another look to make sure the ASM sampling will be fine.

When APM tracing is disabled but both LLMObs and ASM are enabled,
the previous code returned LlmObsStandaloneSampler which never sent
the 1 APM trace/minute required by ASM for billing and service catalog.

Introduce LlmObsAndAsmStandaloneSampler that keeps all LLMObs and
ASM traces while rate-limiting plain APM traces to 1 per minute.

Also clarify the log message for the ASM-only standalone case.

Signed-off-by: matsumo-and <yh134.toisanda@gmail.com>
@Kyle-Verhoog Kyle-Verhoog added the LLMObs LLM Observability related label Apr 7, 2026
@matsumo-and matsumo-and requested a review from smola April 8, 2026 16:24
@jandro996
Copy link
Copy Markdown
Member

jandro996 commented Apr 16, 2026

Sorry for the late review :(

I think the PR fixes the immediate issue correctly, but it introduces a structural technical debt. This is something we had in mind when we started with the standalone implementations, but we didn’t move forward at the time since no other products needed it.

The current approach introduces a new sampler class per product combination:

  • APM disabled + ASM → AsmStandaloneSampler
  • APM disabled + LLMObs → LlmObsStandaloneSampler
  • APM disabled + ASM + LLMObs → LlmObsAndAsmStandaloneSampler

The next standalone product will require yet another class and another branch.

Suggested alternative (just an example)

Each product has three properties, its ProductTraceSource bit, the SamplingMechanism to use when keeping, and whether it needs the 1-per-minute billing trace:

enum StandaloneProduct {
     ASM   (ProductTraceSource.ASM,    SamplingMechanism.APPSEC,  true),
     LLMOBS(ProductTraceSource.LLMOBS, SamplingMechanism.DEFAULT, false);
 }

A single StandaloneSampler receives the list of active products, iterates them to detect product-marked traces, and falls back to rate-limiting (or drop) depending on whether any active product requires the billing trace. forConfig becomes flat and open for extension:

if (!config.isApmTracingEnabled()) {
      List<StandaloneProduct> active = new ArrayList<>();
      if (isAsmEnabled(config))     active.add(StandaloneProduct.ASM);
      if (config.isLlmObsEnabled()) active.add(StandaloneProduct.LLMOBS);
      // next product: one line here + one entry in the enum

      return active.isEmpty()
          ? new ForcePrioritySampler(SAMPLER_DROP, DEFAULT)
          : new StandaloneSampler(active, Clock.systemUTC());
  }

The same generalisation applies to the manual && per product in TraceCollector.setSamplingPriorityIfNecessary, a helper like ProductTraceSource.isAnyStandaloneProductMarked(traceSource) would consolidate that.

Maybe I’m missing some context since this is an older topic, but what do you think about the suggestion? Do you see it as viable?

cc:@smola

@matsumo-and
Copy link
Copy Markdown
Author

Thanks for the detailed feedback, @jandro996 — this makes a lot of sense.

I initially kept the change minimal to address the immediate issue, but I agree that the current approach doesn’t scale well as more standalone products are added, and your suggestion of using a single StandaloneSampler with a product-based configuration looks much cleaner and easier to extend.

Also, good point about applying the same generalization to TraceCollector. Consolidating the per-product checks into a helper like ProductTraceSource.isAnyStandaloneProductMarked would definitely improve maintainability and avoid duplicating logic.

I’m happy to refactor the implementation along these lines.

Replace the three separate sampler classes (AsmStandaloneSampler,
LlmObsStandaloneSampler, LlmObsAndAsmStandaloneSampler) with a single
StandaloneSampler that iterates over a list of active StandaloneProduct
entries, making it trivially extensible to future products.

Also add ProductTraceSource.isAnyStandaloneProductMarked() to simplify
the TraceCollector force-keep bypass check.
@matsumo-and
Copy link
Copy Markdown
Author

matsumo-and commented Apr 16, 2026

Hi @jandro996, I've refactored the implementation along your suggestion:

  • Replaced the three separate sampler classes with a single StandaloneSampler that takes a List
  • Added a StandaloneProduct enum holding each product's trace source bit, sampling mechanism, and billing trace flag
  • Added ProductTraceSource.isAnyStandaloneProductMarked() to simplify the check in TraceCollector
  • Sampler.Builder.forConfig() is now flat and open for extension — adding a new product is just one line in the enum and one line in the builder

6071b27

cc: @smola

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to improve this test as we only assert sampler instanceof StandaloneSampler without verifying which products are active inside it. A wrong product list would pass undetected.
We could improve it replacing the individual selection tests with a @Unroll matrix that covers all relevant product combinations and also asserts the activeProducts content

Example:

  @Unroll
  void "sampler selection: apm=#apmEnabled appsec=#appsecEnabled iast=#iastEnabled llmobs=#llmobsEnabled"() {
      setup:
      System.setProperty("dd.apm.tracing.enabled", String.valueOf(apmEnabled))
      System.setProperty("dd.appsec.enabled",       String.valueOf(appsecEnabled))
      System.setProperty("dd.iast.enabled",          String.valueOf(iastEnabled))
      System.setProperty("dd.llmobs.enabled",        String.valueOf(llmobsEnabled))
      Config config = new Config()

      when:
      def sampler = Sampler.Builder.forConfig(config, null)

      then:
      sampler.class == expectedSampler
      if (sampler instanceof StandaloneSampler) {
          sampler.activeProducts.toSet() == expectedProducts.toSet()
      }

      where:
      apmEnabled | appsecEnabled | iastEnabled | llmobsEnabled | expectedSampler           | expectedProducts
      false      | true          | false       | false         | StandaloneSampler         | [StandaloneProduct.ASM]
      false      | false         | true        | false         | StandaloneSampler         | [StandaloneProduct.ASM]
      false      | true          | true        | false         | StandaloneSampler         | [StandaloneProduct.ASM]
      false      | false         | false       | true          | StandaloneSampler         | [StandaloneProduct.LLMOBS]
      false      | true          | false       | true          | StandaloneSampler         | [StandaloneProduct.LLMOBS, StandaloneProduct.ASM]
      false      | false         | true        | true          | StandaloneSampler         | [StandaloneProduct.LLMOBS, StandaloneProduct.ASM]
      false      | true          | true        | true          | StandaloneSampler         | [StandaloneProduct.LLMOBS, StandaloneProduct.ASM]
      false      | false         | false       | false         | ForcePrioritySampler      | []
      true       | true          | true        | true          | RateByServiceTraceSampler | []
  }

Copy link
Copy Markdown
Author

@matsumo-and matsumo-and Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Replaced the individual tests in SamplerTest with a single @unroll matrix covering all 7 product-flag combinations (LLMOBS only, ASM via appsec/iast/sca, LLMOBS+ASM, all-disabled, APM-enabled).

Each row now also asserts getActiveProducts() contents directly, so a wrong product list will fail the test. Added a package-private getActiveProducts() getter to StandaloneSampler for this purpose.

cf6696e

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The combined sampler tests each set only one flag per span. Could you add a case where the root span has both LLMOBS and ASM bits set simultaneously (e.g. a WAF event on a request that also triggers an LLM call)? That would verify that LLMOBS wins as expected given its position in the list

tracer.getTraceSegment().setTagTop(Tags.PROPAGATED_TRACE_SOURCE, ProductTraceSource.LLMOBS)
 tracer.getTraceSegment().setTagTop(Tags.PROPAGATED_TRACE_SOURCE, ProductTraceSource.ASM)

Copy link
Copy Markdown
Author

@matsumo-and matsumo-and Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added "LLMOBS+ASM: span with both LLMOBS and ASM bits set is kept with DEFAULT mechanism (LLMOBS wins)" to StandaloneSamplerTest.
It sets ProductTraceSource.LLMOBS | ProductTraceSource.ASM on the root span and asserts the decision mechanism is "-0" (DEFAULT) rather than "-5" (APPSEC), confirming the first-match-wins list ordering.

cf6696e

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The isAnyStandaloneProductMarked guard change doesn't seem to have a dedicated test. Would it be possible to add one that exercises the full path, span finishes, TraceCollector evaluates the condition, and verifies the sampler is not called when APM is disabled and the trace already has a product flag with a non-UNSET priority?

Copy link
Copy Markdown
Author

@matsumo-and matsumo-and Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TraceCollectorTest (new file in datadog.trace.core) that exercises the full path:

span.finish() → CoreTracer write → TraceCollector. setSamplingPriorityIfNecessary().

The test sets APM disabled, marks the trace with a LLMOBS product flag, and pre-sets USER_KEEP priority, then uses a Spy to assert 0 * sampler.setSamplingPriority(_).

cf6696e

@jandro996
Copy link
Copy Markdown
Member

Hi @jandro996, I've refactored the implementation along your suggestion:

  • Replaced the three separate sampler classes with a single StandaloneSampler that takes a List
  • Added a StandaloneProduct enum holding each product's trace source bit, sampling mechanism, and billing trace flag
  • Added ProductTraceSource.isAnyStandaloneProductMarked() to simplify the check in TraceCollector
  • Sampler.Builder.forConfig() is now flat and open for extension — adding a new product is just one line in the enum and one line in the builder

6071b27

cc: @smola

Thanks a lot for your changes! It looks good to me, just added a few comments related with testing to improve the coverage

- Replace individual SamplerTest cases with @unroll matrix that covers all
product-flag combinations and asserts activeProducts contents directly;
add package-private getActiveProducts() getter to StandaloneSampler to
enable this
- Add StandaloneSamplerTest case for spans with both LLMOBS and ASM bits set
simultaneously, verifying LLMOBS wins via list ordering
- Add TraceCollectorTest exercising the full span-finish → CoreTracer write
path to verify that setSamplingPriorityIfNecessary skips the sampler when
APM is disabled, a standalone product flag is set, and priority is already
non-UNSET

Signed-off-by: matsumo-and <yh134.toisanda@gmail.com>
@matsumo-and
Copy link
Copy Markdown
Author

matsumo-and commented Apr 21, 2026

Thanks a lot for your changes! It looks good to me, just added a few comments related with testing to improve the coverage

Thanks @jandro996! Addressed all three points below — let me know if any of them missed the mark.

cf6696e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

LLMObs LLM Observability related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants