===================================== Observability — metrics and traces ===================================== AutoControl exposes a Prometheus-compatible ``/metrics`` endpoint and an OpenTelemetry-style tracer for every action. Out of the box you get: - Per-action call counts and latency histograms. - Per-agent-step counters partitioned by tool name and outcome. - Span trees around the executor, the agent loop, and any user code wrapped with the :func:`traced` decorator. The metric primitives are stdlib-only, so you do not need ``prometheus_client`` installed. If you do install it later — or ``opentelemetry-api`` for traces — AutoControl picks them up automatically. .. contents:: :local: :depth: 2 Quick start =========== Start the bundled HTTP exporter and point a scraper at it:: from je_auto_control import default_metrics_exporter exporter = default_metrics_exporter() # binds 127.0.0.1:9090 exporter.start() Then in another process:: $ curl http://127.0.0.1:9090/metrics # HELP autocontrol_action_calls_total Number of AC_* actions executed # TYPE autocontrol_action_calls_total counter autocontrol_action_calls_total{action="AC_screenshot",outcome="ok"} 42 ... Drop the URL into a Prometheus scrape config and you have a Grafana dashboard. Built-in metrics ================ The executor and agent loop emit these automatically — you do **not** need to instrument anything yourself: ================================================ ========= ======================================= Metric Type Labels ================================================ ========= ======================================= ``autocontrol_action_calls_total`` Counter ``action``, ``outcome`` (``ok``/``error``) ``autocontrol_action_duration_seconds`` Histogram ``action`` ``autocontrol_agent_runs_total`` Counter *(none)* ``autocontrol_agent_steps_total`` Counter ``tool``, ``outcome`` ``autocontrol_agent_outcomes_total`` Counter ``outcome`` (``succeeded``/``failed``) ================================================ ========= ======================================= The histogram uses Prometheus' default bucket layout (5 ms → 10 s), which covers everything from a synchronous keystroke to a slow OCR pass. Adding your own metrics ======================= Same primitives are exposed through the package facade:: from je_auto_control import ( MetricCounter, MetricGauge, MetricHistogram, default_metric_registry, ) registry = default_metric_registry() widgets_built = registry.register(MetricCounter( "myapp_widgets_built_total", "Count of widgets generated by my pipeline.", label_names=("kind",), )) widgets_built.inc(labels={"kind": "blue"}) Names follow Prometheus rules: snake_case, no dashes, must start with a letter or underscore. The registry rejects collisions so a typo can't silently fork a series. Tracing ======= The :func:`traced` decorator wraps any callable in a span. The default tracer is a no-op until ``opentelemetry-api`` is installed — when it is, your spans flow through whatever exporter you have configured (OTLP, Jaeger, etc.) without changing your call sites:: from je_auto_control import traced @traced("my_pipeline.process_one") def process_one(item): ... For manual span control:: from je_auto_control import default_tracer tracer = default_tracer() with tracer.start_as_current_span("crop_and_ocr") as span: span.set_attribute("region", "header") ... Production deployment ===================== Recommended setup for a multi-host AutoControl daemon fleet: 1. Each host calls ``default_metrics_exporter().start()`` on boot. 2. Prometheus scrapes ``host:9090/metrics`` every 15 s. 3. ``opentelemetry-api`` + ``opentelemetry-sdk`` + an OTLP exporter are installed for the tracing backend (Datadog / Honeycomb / Jaeger). 4. Grafana dashboard alerts on: - ``rate(autocontrol_action_calls_total{outcome="error"}[5m]) > 0.1`` - ``histogram_quantile(0.99, rate(autocontrol_action_duration_seconds_bucket[5m])) > 2.0`` - ``up{job="autocontrol"} == 0`` The exporter binds to ``127.0.0.1`` by default. To expose it to a scraper on another host, pass ``host="0.0.0.0"`` *and* put it behind a firewall or auth proxy — there is no authentication on ``/metrics``. Security note ============= Metrics include action names but no payload data, so leaking ``/metrics`` to an external scraper is low-risk on its own. Trace spans may carry the first 120 chars of agent goals and tool arguments — review your :func:`traced` call sites before sending traces to a third-party SaaS.