diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/CHANGELOG.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/CHANGELOG.md
index dd6a421cec16..afcf782c8bc3 100644
--- a/sdk/contentunderstanding/azure-ai-contentunderstanding/CHANGELOG.md
+++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/CHANGELOG.md
@@ -8,8 +8,12 @@
### Bugs Fixed
+- Filtered service-emitted `LLMStats:` telemetry entries from the rendered `rai_warnings` front matter in `LlmInputHelper.toLlmInput`.
+
### Other Changes
+- Updated `LlmInputHelper.toLlmInput` page markers from `` to `` and avoided duplicate marker injection when the service markdown already includes `InputPageNumber` markers.
+
## 1.1.0-beta.1 (2026-05-01)
### Features Added
diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/README.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/README.md
index 8e7161193383..d8133648140e 100644
--- a/sdk/contentunderstanding/azure-ai-contentunderstanding/README.md
+++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/README.md
@@ -165,7 +165,7 @@ If you encounter errors:
com.azure
azure-ai-contentunderstanding
- 1.0.0
+ 1.1.0-beta.2
```
[//]: # ({x-version-update-end})
@@ -439,7 +439,7 @@ fields:
figure illustrating monthly values, and describes the AI Document
Intelligence service...
---
-
+
# ==This is title==
## 1. Text
[Latin](https://en.wikipedia.org/wiki/Latin) refers to an ancient Italic language...
@@ -451,6 +451,45 @@ fields:
...
```
+> **About ``**
+>
+> The helper emits `` markers at page boundaries in
+> the markdown body. `N` is the **original 1-based page number from the source
+> document** (i.e., the page index in the analyzed PDF), not a counter that
+> restarts at 1 for each call. Downstream consumers (RAG indexers, page-citation
+> prompts) can rely on the marker value to cite the correct source page even
+> when only a subset of pages was analyzed.
+>
+> **Why this matters when a page range is specified**
+>
+> Use `ContentRange` on the analyze input to analyze only a subset of pages in
+> a multi-page document. The markers in the rendered output preserve the
+> original page identity:
+>
+> ```java
+> // Analyze pages 2-3 and page 5 of a 10-page PDF.
+> SyncPoller poller
+> = contentUnderstandingClient.beginAnalyze("prebuilt-documentSearch",
+> Arrays.asList(new AnalysisInput()
+> .setUrl(multiPageUrl)
+> .setContentRange(new ContentRange("2-3,5"))));
+>
+> AnalysisResult result = poller.getFinalResult();
+> String text = LlmInputHelper.toLlmInput(result);
+> // Output contains markers for the *original* page numbers, not 1, 2, 3:
+> // pages: 2-3, 5
+> // ...
+> //
+> // ...page 2 content...
+> //
+> // ...page 3 content...
+> //
+> // ...page 5 content...
+> ```
+>
+> An LLM or RAG indexer can therefore cite "see page 5" with the correct page
+> number, even though page 5 is the *third* segment in the response.
+
See the [advanced sample][java_cu_sample_to_llm_input] for output options (fields-only,
markdown-only, custom metadata), multi-page content ranges, and multi-segment video.
diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/src/main/java/com/azure/ai/contentunderstanding/LlmInputHelper.java b/sdk/contentunderstanding/azure-ai-contentunderstanding/src/main/java/com/azure/ai/contentunderstanding/LlmInputHelper.java
index dba03f138d08..e8ec76fde8bd 100644
--- a/sdk/contentunderstanding/azure-ai-contentunderstanding/src/main/java/com/azure/ai/contentunderstanding/LlmInputHelper.java
+++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/src/main/java/com/azure/ai/contentunderstanding/LlmInputHelper.java
@@ -58,6 +58,20 @@ public final class LlmInputHelper {
private static final Pattern PAGE_BREAK_PATTERN = Pattern.compile("\\n*\\n*");
+ // Marker emitted by toLlmInput at each page boundary. Future Content Understanding
+ // service versions emit this same marker directly in the returned markdown (per
+ // ContentUnderstanding-Docs#249). When the helper sees any occurrence of this
+ // prefix in the input markdown it treats the service as having already paginated
+ // the content and skips its own injection to avoid duplicate markers.
+ private static final String INPUT_PAGE_MARKER_PREFIX = "}) inserted at page boundaries so downstream consumers
- * can locate content by page number.
+ * ({@code }) inserted at page boundaries so downstream
+ * consumers can locate content by page number. {@code N} is the original
+ * 1-based page number from the source document (i.e., the page index in
+ * the analyzed PDF), not a counter that restarts at 1 for each call. This matters
+ * when the analyze request specifies a {@link com.azure.ai.contentunderstanding.models.ContentRange}
+ * (e.g., {@code "2-3,5"}): the markers in the output will read
+ * {@code InputPageNumber: 2}, {@code 3}, {@code 5} — not {@code 1},
+ * {@code 2}, {@code 3}. Downstream consumers (RAG indexers, page-citation prompts)
+ * can rely on the marker value to cite the correct source page even when only a
+ * subset of pages was analyzed. If the service markdown already contains
+ * {@code \n\n");
+ sb.append(INPUT_PAGE_MARKER_PREFIX).append(' ').append(marker[1]).append(" -->\n\n");
prev = adj;
}
if (prev < cleaned.length()) {
@@ -565,7 +596,7 @@ private static String pageMarkersFromBreaks(String markdown, RenderableContent c
for (int i = 0; i < chunks.length; i++) {
String text = chunks[i].trim();
if (!text.isEmpty()) {
- parts.add("\n\n" + text);
+ parts.add(INPUT_PAGE_MARKER_PREFIX + " " + (startPage + i) + " -->\n\n" + text);
}
}
return String.join("\n\n", parts);
@@ -646,12 +677,20 @@ private static List