Skip to content

Monitoring & Observability#

Concourse Server publishes real-time operational metrics that let you observe server health, diagnose performance issues, and track resource utilization without any third-party tooling. Metrics are exposed through four complementary channels:

Channel Best for How to consume
concourse monitor CLI Ad-hoc inspection, watch mode, scripts Shell
JMX MBeans Rich attribute inspection, in-process monitoring JConsole, VisualVM, JMX clients
Prometheus endpoint Scraping into Prometheus / Grafana / GKE Managed Prometheus HTTP
OpenTelemetry (OTLP) push Datadog, New Relic, OTel Collector gRPC or HTTP

All four channels read from the same underlying set of JMX MBeans. The Prometheus and OpenTelemetry channels are driven by the jmx_prometheus_javaagent attached at JVM startup, which translates JMX attributes into Prometheus or OTLP metrics.


concourse monitor CLI#

concourse monitor <subcommand> renders a live snapshot of one slice of server state. All subcommands accept the flags below:

Flag Default Description
--jmx-port 9010 JMX port used to connect to the server
--json off Emit JSON instead of a formatted dashboard
--watch off Continuously refresh; prints per-interval deltas for counter-style metrics
--interval 2 Refresh period in seconds (used with --watch)
-e / --environment default env Scope per-environment metrics to a specific environment

Subcommands#

Subcommand What it reports
overview Aggregated summary of every section
storage Segments, disk space, seek mix, cache hit rates
operations Per-operation count, average and max latency
transactions Start / commit / fail / abort counts
locks Active locks, parked threads, wait and hold times
heap Heap and non-heap memory usage
gc Per-collector collection counts and pause totals
threads Live, peak, daemon, total-started thread counts
transport Buffer → Database transport progress
compaction Compaction progress and queue depth

Examples#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# One-shot overview
concourse monitor overview

# Watch operations with a 5-second refresh
concourse monitor operations --watch --interval 5

# Scope metrics to a specific environment
concourse monitor storage -e production

# Machine-readable JSON for a metrics pipeline
concourse monitor locks --json

In --watch mode, counter-style metrics display per-interval deltas (for example, TransactionsCommitted 1,204 (+18/s)), making it easy to see throughput in real time.


JMX MBeans#

Concourse Server registers two MXBeans under the com.cinchapi.concourse domain:

ObjectName Scope
com.cinchapi.concourse:type=Server Server-wide
com.cinchapi.concourse:type=Engine,environment=<env> Per environment

type=Server — selected attributes#

Attribute Meaning
Version Build version of the running server
ActiveSessions Client sessions currently connected
ActiveTransactions Transactions currently open
RunningPlugins Active plugin processes
EnvironmentCount Initialized environments
TransactionsStarted Cumulative started transactions
TransactionsCommitted Cumulative successful commits
TransactionsFailed Cumulative failed commits
TransactionsAborted Cumulative client aborts
AtomicCommits Cumulative successful atomic commits
AtomicRetries Cumulative atomic operation retries

type=Engine,environment=<env> — key attribute groups#

Locks: LockCount, RangeLockCount, ParkedThreadCount, ReadLockRequests, WriteLockRequests, FailedTryLocks, ReadLockAvgWaitNanos, WriteLockAvgWaitNanos, ReadLockAvgHoldNanos, WriteLockAvgHoldNanos, MaxReadLockHoldNanos, MaxWriteLockHoldNanos.

Transport: TransportCompleted, TransportAvgDurationNanos, TransportInProgress, TimeSinceLastTransportNanos.

Storage: SegmentCount, DiskSpaceAvailable, DiskSpaceTotal, MemorySeeks, DiskSeeks, BloomFilterGuards, BloomFilterFalsePositives, plus per-chunk-type variants (Table*, Index*, Corpus*), TotalDataBytes, MinSegmentBytes, MaxSegmentBytes, AvgSegmentBytes, BufferPageCount, BufferAvgWritesPerPage.

Caches: IndexCacheHitRate, TableCacheHitRate, CorpusCacheHitRate, and associated hit/miss/eviction counts.

Operations: for each of Select, Find, Browse, Add, Remove, Search, Audit, Gather, and Chronicle — {Op}Count, {Op}AvgNanos, {Op}MaxNanos.

Compaction: CompactionShiftIndex, CompactionShiftCount, CompactionGarbageQueueSize, CompactionShiftsAttempted, CompactionShiftsSucceeded, CompactionSegmentsGarbageCollected.

The jmx_port setting controls which port the JMX RMI connector listens on (default 9010). You can attach any JMX client — JConsole, VisualVM, jmxterm, or a custom JMX consumer — at service:jmx:rmi:///jndi/rmi://<host>:<port>/jmxrmi.


Prometheus#

Enable the Prometheus endpoint by setting one option in concourse.yaml:

1
2
3
monitoring:
  export_metrics: true
  metrics_port: 9091   # optional; defaults to 9091

When this is enabled, Concourse Server attaches the jmx_prometheus_javaagent at JVM startup. The agent listens on metrics_port and exposes the server’s JMX MBeans as Prometheus metrics at http://<host>:<metrics_port>/metrics.

Metric names#

The agent’s translation rules (installed automatically by the server) map MBean attributes to these naming conventions:

Rules used by the agent to translate MBean attributes:

  • type=Engine,environment=X*Nanos becomes concourse_engine_*Seconds{environment="X"} (units converted from nanoseconds to seconds).
  • type=Engine,environment=X*HitRate becomes concourse_engine_*HitRate{environment="X"} (gauge, 0 to 1).
  • Any other type=Engine,environment=X attribute becomes concourse_engine_<attr>{environment="X"}.
  • Any type=Server attribute becomes concourse_server_<attr>.

Prometheus scrape configuration#

1
2
3
4
5
6
7
scrape_configs:
  - job_name: concourse
    scrape_interval: 15s
    static_configs:
      - targets:
          - concourse-host-1:9091
          - concourse-host-2:9091

OpenTelemetry (OTLP)#

For collectors such as the OpenTelemetry Collector, Datadog Agent, New Relic OTLP endpoint, or Grafana Cloud, enable OTLP push from the same metrics pipeline:

1
2
3
4
5
6
monitoring:
  export_metrics: true              # required
  enable_opentelemetry: true
  opentelemetry_endpoint: http://otel-collector:4317
  opentelemetry_protocol: grpc      # grpc or http
  opentelemetry_interval: 15        # seconds between pushes

Defaults:

Option Default
opentelemetry_endpoint http://localhost:4317
opentelemetry_protocol grpc (OTLP/gRPC); alternative: http (OTLP/HTTP)
opentelemetry_interval 15 seconds

The OTLP exporter uses the same JMX translation rules as the Prometheus endpoint, so the metric names you see in your OTLP backend are the concourse_engine_* / concourse_server_* families listed above.

Agent jar required

Both Prometheus and OTLP export require agents/jmx_prometheus_javaagent.jar in the server installation. If the agent jar is not present at startup, Concourse prints a warning and continues without metrics export.


Picking a channel#

  • Interactive troubleshooting: start with concourse monitor overview and drill into locks, transport, or operations as needed. Switch to --watch to watch a problem unfold.
  • One-off script or health check: concourse monitor <subcommand> --json is easy to parse in shell pipelines.
  • Long-term dashboards and alerting: enable the Prometheus endpoint and point Grafana (or any PromQL-compatible tool) at it.
  • Centralized metrics across many services: enable OpenTelemetry push so Concourse participates in the same metrics pipeline as the rest of your stack.

For the underlying configuration keys, see Configuration.