Search before asking
Motivation
Paimon registers commit metrics (commit duration, files added/deleted, records appended, partitions written, etc.) through its internal MetricRegistry / MetricGroupImpl, and these are surfaced in the Spark UI via CustomTaskMetric. However, they are not accessible to external monitoring systems like Prometheus, Graphite, or any JMX-based scraper.
This means that in production environments where Prometheus or other monitoring infrastructure is used to observe Spark applications, Paimon table-level commit metrics are invisible — operators have no way to alert on or dashboard commit duration, write throughput, partition counts, etc. without parsing Spark UI pages or logs.
Solution
Bridge Paimon's internal metrics into a Codahale MetricRegistry exposed as a Spark Source, with a dedicated JmxReporter to ensure MBeans are registered
immediately.
This requires three components:
- SparkMetricGroup — A subclass of MetricGroupImpl that overrides counter(), gauge(), and histogram() to dual-register each metric: once in Paimon's internal map (preserving Spark UI integration) and once as a Codahale gauge in a shared MetricRegistry.
- PaimonMetricsSource — A singleton Spark Source (must live under org.apache.spark due to package-private visibility) that owns:
- A Codahale MetricRegistry shared with all SparkMetricGroup instances
- A JmxReporter started eagerly on that registry, so MBeans appear as soon as metrics are added — this sidesteps Spark's MetricsSystem limitation of snapshotting the registry only at registerSource() time
- Wiring — SparkMetricRegistry.createMetricGroup() is updated to instantiate SparkMetricGroup (instead of plain MetricGroupImpl), passing in the PaimonMetricsSource singleton's Codahale registry. The V1 commit path (PaimonSparkWriter.commit()) is also wired with withMetricRegistry() so commit metrics flow through the same path.
The result is that all Paimon metrics appear under the paimon JMX domain are scrapeable by jmx_prometheus_javaagent or any other JMX-based monitoring tool, with zero impact on existing Spark UI metrics.
Anything else?
N/A
Are you willing to submit a PR?
Search before asking
Motivation
Paimon registers commit metrics (commit duration, files added/deleted, records appended, partitions written, etc.) through its internal MetricRegistry / MetricGroupImpl, and these are surfaced in the Spark UI via CustomTaskMetric. However, they are not accessible to external monitoring systems like Prometheus, Graphite, or any JMX-based scraper.
This means that in production environments where Prometheus or other monitoring infrastructure is used to observe Spark applications, Paimon table-level commit metrics are invisible — operators have no way to alert on or dashboard commit duration, write throughput, partition counts, etc. without parsing Spark UI pages or logs.
Solution
Bridge Paimon's internal metrics into a Codahale MetricRegistry exposed as a Spark Source, with a dedicated JmxReporter to ensure MBeans are registered
immediately.
This requires three components:
- A Codahale MetricRegistry shared with all SparkMetricGroup instances
- A JmxReporter started eagerly on that registry, so MBeans appear as soon as metrics are added — this sidesteps Spark's MetricsSystem limitation of snapshotting the registry only at registerSource() time
The result is that all Paimon metrics appear under the paimon JMX domain are scrapeable by jmx_prometheus_javaagent or any other JMX-based monitoring tool, with zero impact on existing Spark UI metrics.
Anything else?
N/A
Are you willing to submit a PR?