Observability¶
Metrics¶
Controller exposes a set of metrics to make possible state monitoring, alerting and efficiency analyzing.
All metrics have the same prefix ("namespace") which may be configured using a command line option --metrics-prefix
or controllerOptions.metricsPrefix
Helm chart parameter.
The default prefix is git_events_runner
.
Below is a detailed explanation of all the metrics. Metric names are specified without the prefix (just for the sake of more compact presentation).
Controller metrics¶
Name | Type | Labels | Description |
---|---|---|---|
reconcile_count | Counter | namespace resource_kind resource_name |
Total number of reconciliations (updates) of each resource, including failed. |
failed_reconcile_count | Counter | namespace resource_kind resource_name |
Number of failed reconciliations. |
reconcile_duration_seconds | Histogram | namespace resource_kind resource_name |
Reconciliation duration for each resource. |
Note:
ScheduleTriggers
need to be reconciled so far. All other resources have no external state.
Triggers metrics¶
Name | Type | Labels | Description |
---|---|---|---|
trigger_check_count | Counter | namespace trigger_kind trigger_name |
Total number of source checks executed by each trigger. |
trigger_check_duration_seconds | Histogram | namespace trigger_kind trigger_name |
Execution duration of source checks for each trigger. |
Webhooks metrics¶
These metrics are related to incoming webhook requests processing.
Name | Type | Labels | Description |
---|---|---|---|
webhook_requests_count | Counter | namespace trigger_name status |
Total number of webhook requests served by each trigger. |
webhook_requests_duration_seconds | Histogram | namespace trigger_name status |
HTTP request processing duration for each trigger. This is time to schedule check task but not time of task execution. |
Notes: - Since all webhook triggers have the same resource kind (
WebhookTrigger
) corresponding label is omitted. -status
label represents HTTP response status code which was returned on request.
Jobs metrics¶
These metrics are related to actual action jobs executing.
Name | Type | Labels | Description |
---|---|---|---|
jobs_queue_limit | Gauge | - | Current limit of the simultaneously running Jobs (actual action.maxRunningJobs config parameter). |
jobs_queue_waiting | Gauge | - | Current number of Jobs waiting for start in the queue. |
jobs_queue_running | Gauge | - | Current number of running Jobs. |
jobs_queue_waiting_duration_seconds | Histogram | - | Duration of time each Job spent in the waiting queue before starting. |
jobs_queue_completed_count | Counter | namespace trigger_kind trigger_name source_kind source_name action_kind action_name status |
Total number of completed Jobs with respect to trigger, source, action and final status. |
jobs_queue_completed_duration_seconds | Histogram | namespace trigger_kind trigger_name source_kind source_name action_kind action_name status |
Duration of time each Job was in a running state. This metric is actual for finished (successfully of failed) jobs only. It's absent for deleted or expired Jobs. |
Status label explanation:
Value | Description |
---|---|
Succeed | Job finished successfully. |
Failed | Job finished but failed. |
Expired | Job was not running because it was waiting in the queue more than its waiting time limit, defined by config or action parameter jobWaitingTimeoutSeconds . |
Deleted | Job was deleted before it was finished. |
CreateError | Job was not created due to Kubernetes error. |