Prometheus Metrics Reference

SPYDER exposes comprehensive Prometheus metrics for monitoring performance, health, and operational status.

Metrics Endpoint

Default Configuration:

bash

./bin/spyder -metrics_addr=:9090

Access Metrics:

bash

curl http://localhost:9090/metrics

Disable Metrics:

bash

./bin/spyder -metrics_addr=""

Core Metrics

Task Processing Metrics

`spyder_tasks_total`

Counter tracking total tasks processed by status.

Type: Counter
Labels: status

prometheus

# HELP spyder_tasks_total tasks processed
# TYPE spyder_tasks_total counter
spyder_tasks_total{status="ok"} 1234
spyder_tasks_total{status="error"} 5

Label Values:

ok: Successfully processed domains
error: Failed domain processing (panics, critical errors)

Query Examples:

promql

# Processing rate (domains/sec)
rate(spyder_tasks_total[5m])

# Error rate
rate(spyder_tasks_total{status="error"}[5m]) / rate(spyder_tasks_total[5m])

# Total domains processed
sum(spyder_tasks_total)

Edge Discovery Metrics

`spyder_edges_total`

Counter tracking discovered edges by type.

Type: Counter
Labels: type

prometheus

# HELP spyder_edges_total edges emitted
# TYPE spyder_edges_total counter
spyder_edges_total{type="RESOLVES_TO"} 2468
spyder_edges_total{type="LINKS_TO"} 1357
spyder_edges_total{type="USES_CERT"} 891
spyder_edges_total{type="USES_NS"} 567
spyder_edges_total{type="USES_MX"} 234
spyder_edges_total{type="ALIAS_OF"} 123

Edge Types:

RESOLVES_TO: DNS A/AAAA records
LINKS_TO: External HTTP links
USES_CERT: TLS certificates
USES_NS: Nameserver records
USES_MX: Mail exchange records
ALIAS_OF: CNAME records

Query Examples:

promql

# Edge discovery rate by type
rate(spyder_edges_total[5m])

# Most common edge types
topk(5, sum by (type) (spyder_edges_total))

# DNS vs HTTP edge ratio
sum(spyder_edges_total{type=~"RESOLVES_TO|USES_NS|USES_MX|ALIAS_OF"}) /
sum(spyder_edges_total{type="LINKS_TO"})

Policy Enforcement Metrics

`spyder_robots_blocked_total`

Counter tracking domains blocked by robots.txt.

Type: Counter
No Labels

prometheus

# HELP spyder_robots_blocked_total robots.txt blocks
# TYPE spyder_robots_blocked_total counter
spyder_robots_blocked_total 45

Query Examples:

promql

# Robots.txt block rate
rate(spyder_robots_blocked_total[5m])

# Percentage of domains blocked
rate(spyder_robots_blocked_total[5m]) / rate(spyder_tasks_total[5m]) * 100

Derived Metrics

Performance Indicators

Throughput Metrics:

promql

# Domains processed per second
rate(spyder_tasks_total[5m])

# Edges discovered per second
rate(spyder_edges_total[5m])

# Average edges per domain
rate(spyder_edges_total[5m]) / rate(spyder_tasks_total[5m])

Efficiency Metrics:

promql

# Success rate
rate(spyder_tasks_total{status="ok"}[5m]) / rate(spyder_tasks_total[5m])

# Discovery efficiency (edges per successful domain)
rate(spyder_edges_total[5m]) / rate(spyder_tasks_total{status="ok"}[5m])

Health Indicators

Error Monitoring:

promql

# Error rate threshold alert
rate(spyder_tasks_total{status="error"}[5m]) > 0.1

# High robots.txt blocking (potential configuration issue)
rate(spyder_robots_blocked_total[5m]) / rate(spyder_tasks_total[5m]) > 0.5

Go Runtime Metrics

SPYDER automatically exposes Go runtime metrics:

Memory Metrics

prometheus

# Memory usage
go_memstats_alloc_bytes
go_memstats_heap_alloc_bytes
go_memstats_heap_inuse_bytes
go_memstats_heap_sys_bytes

# Garbage collection
go_memstats_gc_sys_bytes
go_gc_duration_seconds

Goroutine Metrics

prometheus

# Active goroutines
go_goroutines

# Thread count
go_threads

Query Examples:

promql

# Memory growth rate
rate(go_memstats_alloc_bytes[5m])

# GC frequency
rate(go_gc_duration_seconds_count[5m])

# Goroutine count per worker (approximately)
go_goroutines / 256  # assuming 256 workers

HTTP Metrics (from Prometheus client)

Request Metrics

prometheus

# HTTP requests to metrics endpoint
promhttp_metric_handler_requests_total
promhttp_metric_handler_requests_in_flight_gauge

# Response times
promhttp_metric_handler_request_duration_seconds

Custom Dashboards

Grafana Dashboard Configuration

Main Dashboard Panels:

json

{
  "dashboard": {
    "title": "SPYDER Probe Monitoring",
    "panels": [
      {
        "title": "Processing Rate",
        "type": "stat",
        "targets": [
          {
            "expr": "rate(spyder_tasks_total[5m])",
            "legendFormat": "domains/sec"
          }
        ]
      },
      {
        "title": "Edge Discovery by Type",
        "type": "piechart", 
        "targets": [
          {
            "expr": "sum by (type) (spyder_edges_total)",
            "legendFormat": "{{type}}"
          }
        ]
      },
      {
        "title": "Success Rate",
        "type": "stat",
        "targets": [
          {
            "expr": "rate(spyder_tasks_total{status=\"ok\"}[5m]) / rate(spyder_tasks_total[5m]) * 100",
            "legendFormat": "success %"
          }
        ]
      },
      {
        "title": "Memory Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "go_memstats_heap_alloc_bytes / 1024 / 1024",
            "legendFormat": "Heap MB"
          }
        ]
      }
    ]
  }
}

Alerting Rules

Prometheus Alerting Rules:

yaml

groups:
- name: spyder.rules
  rules:
  - alert: SpyderHighErrorRate
    expr: rate(spyder_tasks_total{status="error"}[5m]) / rate(spyder_tasks_total[5m]) > 0.05
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "SPYDER error rate is high"
      description: "Error rate is {{ $value | humanizePercentage }} over the last 5 minutes"

  - alert: SpyderLowThroughput
    expr: rate(spyder_tasks_total[5m]) < 10
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "SPYDER throughput is low"
      description: "Processing only {{ $value }} domains per second"

  - alert: SpyderHighMemoryUsage
    expr: go_memstats_heap_alloc_bytes / 1024 / 1024 > 1024
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "SPYDER memory usage is high"
      description: "Using {{ $value }}MB of heap memory"

  - alert: SpyderManyRobotsBlocks
    expr: rate(spyder_robots_blocked_total[5m]) / rate(spyder_tasks_total[5m]) > 0.8
    for: 10m
    labels:
      severity: info
    annotations:
      summary: "Many domains blocked by robots.txt"
      description: "{{ $value | humanizePercentage }} of domains blocked"

  - alert: SpyderDown
    expr: up{job="spyder"} == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "SPYDER probe is down"
      description: "SPYDER probe has been down for more than 1 minute"

Monitoring Best Practices

Scrape Configuration

Prometheus Configuration:

yaml

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
- job_name: 'spyder'
  static_configs:
  - targets: ['spyder:9090']
  scrape_interval: 30s
  metrics_path: /metrics
  
  # Optional: Basic authentication
  basic_auth:
    username: monitoring
    password_file: /etc/prometheus/spyder.password
    
  # Optional: TLS configuration
  tls_config:
    insecure_skip_verify: false
    ca_file: /etc/ssl/certs/ca.pem

Security Considerations

Metrics Endpoint Security:

bash

# Bind to localhost only
-metrics_addr=127.0.0.1:9090

# Use reverse proxy with authentication
nginx_config:
  location /metrics {
    auth_basic "Prometheus";
    auth_basic_user_file /etc/nginx/.htpasswd;
    proxy_pass http://127.0.0.1:9090/metrics;
  }

Performance Impact

Metrics Collection Overhead:

CPU: < 1% additional overhead
Memory: ~1MB for metric storage
Network: ~1KB/scrape (depends on activity)

High-Frequency Scraping:

yaml

# For high-resolution monitoring
scrape_configs:
- job_name: 'spyder-detailed'
  static_configs:
  - targets: ['spyder:9090']
  scrape_interval: 5s  # High frequency
  metrics_path: /metrics

Troubleshooting

Common Issues:

bash

# Check if metrics endpoint is responding
curl -v http://localhost:9090/metrics

# Verify metric format
curl -s http://localhost:9090/metrics | grep spyder

# Check for parsing errors in Prometheus
curl -s http://prometheus:9090/api/v1/targets

Debug Queries:

promql

# Check for counter resets (restarts)
resets(spyder_tasks_total[1h])

# Verify metric freshness
time() - timestamp(spyder_tasks_total)

# Check for missing metrics
absent(spyder_tasks_total)

This comprehensive metrics setup provides full visibility into SPYDER's performance, health, and operational characteristics for effective monitoring and troubleshooting.

Prometheus Metrics Reference ​

Metrics Endpoint ​

Core Metrics ​

Task Processing Metrics ​

spyder_tasks_total ​

Edge Discovery Metrics ​

spyder_edges_total ​

Policy Enforcement Metrics ​

spyder_robots_blocked_total ​

Derived Metrics ​

Performance Indicators ​

Health Indicators ​

Go Runtime Metrics ​

Memory Metrics ​

Goroutine Metrics ​

HTTP Metrics (from Prometheus client) ​

Request Metrics ​

Custom Dashboards ​

Grafana Dashboard Configuration ​

Alerting Rules ​

Monitoring Best Practices ​

Scrape Configuration ​

Security Considerations ​

Performance Impact ​

Troubleshooting ​

Prometheus Metrics Reference

Metrics Endpoint

Core Metrics

Task Processing Metrics

`spyder_tasks_total`

Edge Discovery Metrics

`spyder_edges_total`

Policy Enforcement Metrics

`spyder_robots_blocked_total`

Derived Metrics

Performance Indicators

Health Indicators

Go Runtime Metrics

Memory Metrics

Goroutine Metrics

HTTP Metrics (from Prometheus client)

Request Metrics

Custom Dashboards

Grafana Dashboard Configuration

Alerting Rules

Monitoring Best Practices

Scrape Configuration

Security Considerations

Performance Impact

Troubleshooting