Observability¶
Metrics¶
Controller exposes a set of metrics to make possible state monitoring, alerting and efficiency analyzing.
All metrics have the same prefix ("namespace") which may be configured using a command line option --metrics-prefix
or controllerOptions.metricsPrefix Helm chart parameter.
The default prefix is git_events_runner.
Below is a detailed explanation of all the metrics. Metric names are specified without the prefix (just for the sake of more compact presentation).
Controller metrics¶
| Name | Type | Labels | Description |
|---|---|---|---|
| reconcile_count | Counter | namespace resource_kind resource_name |
Total number of reconciliations (updates) of each resource, including failed. |
| failed_reconcile_count | Counter | namespace resource_kind resource_name |
Number of failed reconciliations. |
| reconcile_duration_seconds | Histogram | namespace resource_kind resource_name |
Reconciliation duration for each resource. |
Note:
ScheduleTriggersneed to be reconciled so far. All other resources have no external state.
Triggers metrics¶
| Name | Type | Labels | Description |
|---|---|---|---|
| trigger_check_count | Counter | namespace trigger_kind trigger_name |
Total number of source checks executed by each trigger. |
| trigger_check_duration_seconds | Histogram | namespace trigger_kind trigger_name |
Execution duration of source checks for each trigger. |
Webhooks metrics¶
These metrics are related to incoming webhook requests processing.
| Name | Type | Labels | Description |
|---|---|---|---|
| webhook_requests_count | Counter | namespace trigger_name status |
Total number of webhook requests served by each trigger. |
| webhook_requests_duration_seconds | Histogram | namespace trigger_name status |
HTTP request processing duration for each trigger. This is time to schedule check task but not time of task execution. |
Notes: - Since all webhook triggers have the same resource kind (
WebhookTrigger) corresponding label is omitted. -statuslabel represents HTTP response status code which was returned on request.
Jobs metrics¶
These metrics are related to actual action jobs executing.
| Name | Type | Labels | Description |
|---|---|---|---|
| jobs_queue_limit | Gauge | - | Current limit of the simultaneously running Jobs (actual action.maxRunningJobs config parameter). |
| jobs_queue_waiting | Gauge | - | Current number of Jobs waiting for start in the queue. |
| jobs_queue_running | Gauge | - | Current number of running Jobs. |
| jobs_queue_waiting_duration_seconds | Histogram | - | Duration of time each Job spent in the waiting queue before starting. |
| jobs_queue_completed_count | Counter | namespace trigger_kind trigger_name source_kind source_name action_kind action_name status |
Total number of completed Jobs with respect to trigger, source, action and final status. |
| jobs_queue_completed_duration_seconds | Histogram | namespace trigger_kind trigger_name source_kind source_name action_kind action_name status |
Duration of time each Job was in a running state. This metric is actual for finished (successfully of failed) jobs only. It's absent for deleted or expired Jobs. |
Status label explanation:
| Value | Description |
|---|---|
| Succeed | Job finished successfully. |
| Failed | Job finished but failed. |
| Expired | Job was not running because it was waiting in the queue more than its waiting time limit, defined by config or action parameter jobWaitingTimeoutSeconds. |
| Deleted | Job was deleted before it was finished. |
| CreateError | Job was not created due to Kubernetes error. |