Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 32 additions & 4 deletions migration/mongosync_insights/CONFIGURATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This document explains the configuration management system for Mongosync Insight

## Prerequisites

**Python 3.11+** and **libmagic** system library are required to run Mongosync Insights. See [README.md](README.md) for complete installation instructions including system dependencies.
**Python 3.11+** is required to run Mongosync Insights. See [README.md](README.md) for complete installation instructions.

## Configuration Overview

Expand Down Expand Up @@ -34,7 +34,7 @@ All configuration can be set using `export` commands before running the applicat
|----------|---------|-------------|
| `MI_CONNECTION_STRING` | _(empty)_ | MongoDB connection string (optional, can be provided via UI) |
| `MI_VERIFIER_CONNECTION_STRING` | _(falls back to `MI_CONNECTION_STRING`)_ | MongoDB connection string for the migration verifier database. When omitted, the value of `MI_CONNECTION_STRING` is used. Set this when the verifier database lives on a different cluster. |
| `MI_INTERNAL_DB_NAME` | `mongosync_reserved_for_internal_use` | MongoDB internal database name |
| `MI_INTERNAL_DB_NAME` | _(auto-detected)_ | MongoDB internal database name. When not set, the app auto-detects between `__mdb_internal_mongosync` (new) and `mongosync_reserved_for_internal_use` (legacy). Set this variable to override auto-detection. |
| `MI_POOL_SIZE` | `10` | MongoDB connection pool size |
| `MI_TIMEOUT_MS` | `5000` | MongoDB connection timeout in milliseconds |

Expand All @@ -55,14 +55,24 @@ All configuration can be set using `export` commands before running the applicat

| Variable | Default | Description |
|----------|---------|-------------|
| `MI_ERROR_PATTERNS_FILE` | `error_patterns.json` _(same directory as the application)_ | Path to a custom error patterns JSON file used during log analysis to detect common errors (e.g., oplog rollover, timeouts, verifier mismatches) |
| `MI_ERROR_PATTERNS_FILE` | `lib/error_patterns.json` _(auto-detected)_ | Path to a custom error patterns JSON file used during log analysis to detect common errors (e.g., oplog rollover, timeouts, verifier mismatches) |

### UI Customization

| Variable | Default | Description |
|----------|---------|-------------|
| `MI_MAX_PARTITIONS_DISPLAY` | `10` | Maximum partitions to display in UI |

### Log Viewer & Snapshot Settings

| Variable | Default | Description |
|----------|---------|-------------|
| `MI_LOG_VIEWER_MAX_LINES` | `2000` | Maximum number of recent log lines shown in the Log Viewer tail view |
| `MI_LOG_STORE_DIR` | System temp directory | Directory for SQLite log stores and analysis snapshot files |
| `MI_LOG_STORE_MAX_AGE_HOURS` | `24` | TTL in hours for log store and snapshot files (based on last-access mtime) |

> **Note**: By default, log store databases and snapshot files are saved to the OS temp directory (e.g., `/tmp` on Linux/macOS), which may be cleared on system reboot. Set `MI_LOG_STORE_DIR` to a persistent path (e.g., `/data/mongosync-insights/store`) to retain snapshots across restarts. Files are cleaned up automatically on app startup, on logout, and lazily on access when they exceed the configured TTL. Loading a saved snapshot resets its TTL by touching the file's modification time.

### Security Settings

| Variable | Default | Description |
Expand All @@ -84,7 +94,7 @@ All configuration can be set using `export` commands before running the applicat

---

## 🚀 Usage Examples
## Usage Examples

### Example 1: Basic Local Development

Expand Down Expand Up @@ -220,6 +230,24 @@ python3 mongosync_insights.py

**Note**: When `MI_VERIFIER_CONNECTION_STRING` is not set, it falls back to `MI_CONNECTION_STRING`. Set it explicitly when the migration-verifier writes to a different cluster.

### Example 8: Persistent Snapshots and Custom Log Viewer

Configure snapshot storage location, retention period, and log viewer buffer size:

```bash
# Store snapshots in a persistent directory
export MI_LOG_STORE_DIR=/data/mongosync-insights/store

# Keep snapshots for 48 hours instead of the default 24
export MI_LOG_STORE_MAX_AGE_HOURS=48

# Show up to 5000 recent log lines in the Log Viewer tail view
export MI_LOG_VIEWER_MAX_LINES=5000

# Run the application
python3 mongosync_insights.py
```

---

## Troubleshooting
Expand Down
67 changes: 64 additions & 3 deletions migration/mongosync_insights/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,13 @@ This tool can parse **mongosync** logs and metrics files, read the **mongosync**

## What Does This Tool Do?

Mongosync Insights provides four main capabilities:
Mongosync Insights provides five main capabilities:

1. **Log File Analysis**: Upload and parse mongosync log files to visualize migration progress, data transfer rates, performance metrics, configuration options, and detected errors
2. **Mongosync Metrics Analysis**: Upload and parse `mongosync_metrics.log` files to visualize 40+ mongosync metrics across Collection Copy, CEA, Indexes, Verifier, and more
3. **Live Monitoring**: Connect directly to the **mongosync** internal database or to the **mongosync** progress endpoint for real-time monitoring of ongoing migrations with auto-refreshing dashboards
4. **Migration Verifier Monitoring**: Connect to the database where the [migration-verifier](https://github.com/mongodb-labs/migration-verifier) tool stores its metadata to track verification progress, generation history, and mismatch details
4. **Combined Monitoring**: Provide both a MongoDB connection string and a progress endpoint URL to get a comprehensive view that merges metadata insights with real-time progress data
5. **Migration Verifier Monitoring**: Connect to the database where the [migration-verifier](https://github.com/mongodb-labs/migration-verifier) tool stores its metadata to track verification progress, generation history, and mismatch details

## Prerequisites

Expand Down Expand Up @@ -64,7 +65,7 @@ python3 mongosync_insights.py

The application will start and display:
```
Starting Mongosync Insights v0.8.0.18
Starting Mongosync Insights v0.8.1.14
Server: 127.0.0.1:3030
```

Expand All @@ -79,13 +80,24 @@ http://localhost:3030

## Using Mongosync Insights

### Sidebar Navigation

Results pages include a left sidebar with quick-access buttons:

- **Upload** — opens a dialog listing saved analyses with **Load** and **Delete** actions, plus an **"Upload New File"** button to parse a new log file
- **Settings** — configure the live monitoring refresh interval, theme (Light, Dark, or System), and color scheme (MongoDB Green, Blue, Slate, Ocean)
- **Logout** — clears the current session and returns to the home page
- **Credits** — displays developer credits

### Option 1: Parsing Mongosync Log Files

1. Click the **"Browse"** or **"Choose File"** button
2. Select your mongosync log file from your file system
3. Click **"Open"** or **"Upload"**
4. The application will process the file and display results across multiple tabs

**Duplicate Upload Detection:** If you upload a file with the same name as an existing saved analysis, a dialog will appear offering three options: **Load Previous** (open the saved session without re-parsing), **Replace** (delete the saved session and parse the file again), or **Cancel**.

**Supported File Formats:**
- Plain text: `.log`, `.json`, `.out`
- Compressed: `.gz`, `.zip`, `.bz2`, `.tar.gz`, `.tgz`, `.tar.bz2`
Expand All @@ -111,12 +123,24 @@ After upload, the results are organized into tabs:
| **Options** | Mongosync configuration options extracted from the logs (with **Copy as Markdown** for easy sharing) |
| **Collections** | Collection-level progress details (with **Copy as Markdown** for easy sharing) |
| **Errors** | Detected error patterns such as oplog rollover, timeouts, verifier mismatches, and write conflicts during cutover |
| **Log Viewer** | Browse recent log lines with severity filtering, semantic focus, multiple view modes (Highlighted, Raw, Pretty JSON, Summary), and full-text search across the entire log file |

![Mongosync Logs Tab](images/mongosync_logs_logs.png)
![Mongosync Metrics Tab](images/mongosync_logs_metrics.png)
![Mongosync Options Tab](images/mongosync_logs_options.png)
![Mongosync Collections and Partitions Tab](images/mongosync_logs_collections_partitions.png)
![Mongosync Errors and Warnings Tab](images/mongosync_logs_errors.png)
![Mongosync Log Viewer Tab](images/mongosync_logs_logviewer.png)

#### Analysis Snapshot Persistence

After parsing a log file, the analysis is automatically saved as a **snapshot** to disk. This allows you to reload a previous analysis instantly without re-parsing the original file.

- The home page displays a **"Previous Analyses"** section below the upload form, listing all saved snapshots with their filename, date, file size, and age
- Click **"Load"** to reopen a saved analysis — all tabs (plots, tables, log viewer) are restored immediately
- Click the **delete** button to remove a snapshot you no longer need
- Snapshots expire automatically after **24 hours** of inactivity; each time you load a snapshot, the TTL resets for another 24 hours
- By default, snapshots are stored in the system's temp directory. Use the `MI_LOG_STORE_DIR` environment variable to set a persistent storage location. See [CONFIGURATION.md](CONFIGURATION.md) for details

### Option 2: Live Monitoring (Metadata)

Expand Down Expand Up @@ -173,6 +197,18 @@ This combined approach provides:
- Full metadata insights from the destination cluster (partitions, collection progress, configuration)
- Real-time progress data from the mongosync endpoint (state, lag time, verification status)

#### About the Embedded Verifier

The [Embedded Verifier](https://www.mongodb.com/docs/cluster-to-cluster-sync/current/reference/verification/embedded/) is mongosync's built-in verification mechanism, available since mongosync v1.9 and enabled by default. It performs document hashing on both source and destination clusters to confirm data was transferred correctly, without requiring any external tools.

**Embedded Verifier field (Status tab — Option 2):** The "Embedded Verifier" field displays the `verificationmode` value from mongosync's internal metadata. Possible values: `Enabled` (default — verification is active) or `Disabled` (verification was turned off at start).

**Can Write signal (Endpoint tab — Option 3):** `Can Write: True` is the definitive signal that the embedded verifier has completed successfully and found no mismatches. Until verification passes, `Can Write` remains `False`. This is the key field to watch for confirming migration correctness.

**Verification phases (Endpoint tab — Option 3):** The "Embedded Verifier Status" table shows a `Phase` field for both source and destination independently. Key phases include `stream hashing` (actively hashing documents from change streams) and `idle` (not yet started or between operations).

**Verifier Lag Time (Endpoint tab and uploaded metrics):** The `Lag Time Seconds` field in the verification table (and `Verifier Lag Time` in uploaded `mongosync_metrics.log` files) shows how far behind the verifier is in checking documents. High lag means verification will take longer to complete after commit. Persistently high lag may indicate the verifier cannot keep up with the write load.

### Option 5: Migration Verifier Monitoring

1. Enter the MongoDB **connection string** to the cluster where the [migration-verifier](https://github.com/mongodb-labs/migration-verifier) tool writes its metadata (typically the destination cluster)
Expand All @@ -187,6 +223,31 @@ This combined approach provides:

![Migration Verifier Dashboard](images/migration_verifier_dashboard.png)

#### Important: Embedded Verifier

> If verifying a migration done via mongosync, please check if the [Embedded Verifier](https://www.mongodb.com/docs/cluster-to-cluster-sync/current/reference/verification/embedded/) can be used, as it is the preferred approach for verification.

#### About Migration Verifier

The [migration-verifier](https://github.com/mongodb-labs/migration-verifier) is a standalone tool that validates migration correctness by comparing documents between source and destination clusters. It stores its state in a MongoDB database (default: `migration_verification_metadata`).

**How it works:** The verifier operates in two phases. First, an initial check (generation 0) partitions the source data into chunks and compares documents byte-by-byte between source and destination. Then, iterative rechecks (generation 1, 2, ...) re-verify any documents that changed or failed during previous rounds. Only the **last generation's failures** are significant — earlier failures may be transient due to ongoing writes.

**Key terms:**

| Term | Description |
|------|-------------|
| **Generation** | A round of verification. Generation 0 is the initial full check; subsequent generations are rechecks of changed/failed documents. |
| **FINAL** | Label shown on the dashboard for the last generation — only its failures indicate real mismatches. |
| **Task statuses** | `added` (unstarted), `processing` (in-progress), `completed` (no issues), `failed` (document mismatch), `mismatch` (collection metadata mismatch). |

**Metadata collections:**

| Collection | Purpose |
|------------|---------|
| `verification_tasks` | Tracks each verification task with a generation number, status, and type (`verify` for documents, `verifyCollection` for metadata). |
| `mismatches` | Records document-level mismatches found during verification. |

**Note**: The `MI_VERIFIER_CONNECTION_STRING` environment variable can be used to pre-configure the connection string. When omitted, it falls back to `MI_CONNECTION_STRING`. See **[CONFIGURATION.md](CONFIGURATION.md)** for details.

## Advanced Configuration
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified migration/mongosync_insights/images/mongosync_insights_home.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import logging
import uuid
import time
import tempfile
import threading
from pathlib import Path
from functools import lru_cache
Expand All @@ -22,7 +23,17 @@

# Application constants
APP_NAME = "Mongosync Insights"
APP_VERSION = "0.8.0.18"
APP_VERSION = "0.8.1.14"

DEVELOPER_CREDITS = {
"copyright": "\u00a9 MongoDB Inc.",
"year": "2025 - 2026",
"team_name": "Migration Factory TS Team",
"contributors": [
{"name": "Marcio Ribeiro", "role": "Development"},
{"name": "Krishna Kattumadam", "role": "Development"},
],
}

# File upload settings
MAX_FILE_SIZE = int(os.getenv('MI_MAX_FILE_SIZE', str(10 * 1024 * 1024 * 1024))) # 10GB default
Expand All @@ -38,6 +49,11 @@
'application/octet-stream' # Generic binary (often used for compressed files)
]

# Log Viewer settings
LOG_VIEWER_MAX_LINES = int(os.getenv('MI_LOG_VIEWER_MAX_LINES', '2000'))
LOG_STORE_DIR = os.getenv('MI_LOG_STORE_DIR', tempfile.gettempdir())
LOG_STORE_MAX_AGE_HOURS = int(os.getenv('MI_LOG_STORE_MAX_AGE_HOURS', '24'))

# Compressed file MIME types (subset of ALLOWED_MIME_TYPES)
COMPRESSED_MIME_TYPES = {
'application/gzip', 'application/x-gzip',
Expand Down Expand Up @@ -118,6 +134,9 @@ def classify_file_type(filename: str) -> str:

# MongoDB settings
INTERNAL_DB_NAME = os.getenv('MI_INTERNAL_DB_NAME', "mongosync_reserved_for_internal_use")
INTERNAL_DB_NAME_NEW = "__mdb_internal_mongosync"
VERIFIER_SRC_NAMESPACE = "__mdb_internal_mongosync_verifier_src"
VERIFIER_DST_NAMESPACE = "__mdb_internal_mongosync_verifier_dst"

# UI settings
MAX_PARTITIONS_DISPLAY = int(os.getenv('MI_MAX_PARTITIONS_DISPLAY', '10'))
Expand Down Expand Up @@ -284,6 +303,46 @@ def get_database(connection_string, database_name):
client = get_mongo_client(connection_string)
return client[database_name]

_resolved_internal_db_cache = {}
_resolved_internal_db_lock = threading.Lock()

def resolve_internal_db_name(connection_string):
"""
Auto-detect which mongosync internal database name exists on the cluster.

Checks for the new name first (__mdb_internal_mongosync), then falls back
to the legacy name (mongosync_reserved_for_internal_use). Results are cached
per connection string. The MI_INTERNAL_DB_NAME env var acts as a hard override.

Args:
connection_string (str): MongoDB connection string

Returns:
str: The resolved internal database name
"""
if os.getenv('MI_INTERNAL_DB_NAME'):
return INTERNAL_DB_NAME

with _resolved_internal_db_lock:
if connection_string in _resolved_internal_db_cache:
return _resolved_internal_db_cache[connection_string]

logger = logging.getLogger(__name__)
try:
client = get_mongo_client(connection_string)
db_names = client.list_database_names()
if INTERNAL_DB_NAME_NEW in db_names:
resolved = INTERNAL_DB_NAME_NEW
else:
resolved = INTERNAL_DB_NAME
with _resolved_internal_db_lock:
_resolved_internal_db_cache[connection_string] = resolved
logger.info(f"Resolved internal DB name: {resolved}")
return resolved
except Exception as e:
logger.warning(f"Could not auto-detect internal DB name, using default: {e}")
return INTERNAL_DB_NAME

def validate_connection(connection_string):
"""
Validate a MongoDB connection string and test connectivity.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -141,5 +141,19 @@
"pattern": "refetched db-spec for recreating dst coll for natural scan",
"friendly_name": "Restart Collection Copy in natural order",
"full_error_message": "refetched db-spec for recreating dst coll for natural scan, collSpec: {COLLECTION DETAILS UUID {}}, isSrcDropped: XXX"
},
{
"pattern": "Change Event Application failed",
"friendly_name": "CEA failed undefined reason"
},
{
"pattern": "Failed to apply batch #1 for CRUD event application on the destination. Giving up on batch CRUD event application.",
"friendly_name": "Timeout on destination",
"full_error_message": "Failed to apply batch #1 for CRUD event application on the destination. Giving up on batch CRUD event application."
},
{
"pattern": "Got a fatal error running Mongosync",
"friendly_name": "Fatal error running Mongosync",
"full_error_message": "Got a fatal error running Mongosync"
}
]
Loading