Prometheus Metrics Reference
SPYDER exposes comprehensive Prometheus metrics for monitoring performance, health, and operational status.
Metrics Endpoint
Default Configuration:
./bin/spyder -metrics_addr=:9090
Access Metrics:
curl http://localhost:9090/metrics
Disable Metrics:
./bin/spyder -metrics_addr=""
Core Metrics
Task Processing Metrics
spyder_tasks_total
Counter tracking total tasks processed by status.
Type: Counter
Labels: status
# HELP spyder_tasks_total tasks processed
# TYPE spyder_tasks_total counter
spyder_tasks_total{status="ok"} 1234
spyder_tasks_total{status="error"} 5
Label Values:
ok
: Successfully processed domainserror
: Failed domain processing (panics, critical errors)
Query Examples:
# Processing rate (domains/sec)
rate(spyder_tasks_total[5m])
# Error rate
rate(spyder_tasks_total{status="error"}[5m]) / rate(spyder_tasks_total[5m])
# Total domains processed
sum(spyder_tasks_total)
Edge Discovery Metrics
spyder_edges_total
Counter tracking discovered edges by type.
Type: Counter
Labels: type
# HELP spyder_edges_total edges emitted
# TYPE spyder_edges_total counter
spyder_edges_total{type="RESOLVES_TO"} 2468
spyder_edges_total{type="LINKS_TO"} 1357
spyder_edges_total{type="USES_CERT"} 891
spyder_edges_total{type="USES_NS"} 567
spyder_edges_total{type="USES_MX"} 234
spyder_edges_total{type="ALIAS_OF"} 123
Edge Types:
RESOLVES_TO
: DNS A/AAAA recordsLINKS_TO
: External HTTP linksUSES_CERT
: TLS certificatesUSES_NS
: Nameserver recordsUSES_MX
: Mail exchange recordsALIAS_OF
: CNAME records
Query Examples:
# Edge discovery rate by type
rate(spyder_edges_total[5m])
# Most common edge types
topk(5, sum by (type) (spyder_edges_total))
# DNS vs HTTP edge ratio
sum(spyder_edges_total{type=~"RESOLVES_TO|USES_NS|USES_MX|ALIAS_OF"}) /
sum(spyder_edges_total{type="LINKS_TO"})
Policy Enforcement Metrics
spyder_robots_blocked_total
Counter tracking domains blocked by robots.txt.
Type: Counter
No Labels
# HELP spyder_robots_blocked_total robots.txt blocks
# TYPE spyder_robots_blocked_total counter
spyder_robots_blocked_total 45
Query Examples:
# Robots.txt block rate
rate(spyder_robots_blocked_total[5m])
# Percentage of domains blocked
rate(spyder_robots_blocked_total[5m]) / rate(spyder_tasks_total[5m]) * 100
Derived Metrics
Performance Indicators
Throughput Metrics:
# Domains processed per second
rate(spyder_tasks_total[5m])
# Edges discovered per second
rate(spyder_edges_total[5m])
# Average edges per domain
rate(spyder_edges_total[5m]) / rate(spyder_tasks_total[5m])
Efficiency Metrics:
# Success rate
rate(spyder_tasks_total{status="ok"}[5m]) / rate(spyder_tasks_total[5m])
# Discovery efficiency (edges per successful domain)
rate(spyder_edges_total[5m]) / rate(spyder_tasks_total{status="ok"}[5m])
Health Indicators
Error Monitoring:
# Error rate threshold alert
rate(spyder_tasks_total{status="error"}[5m]) > 0.1
# High robots.txt blocking (potential configuration issue)
rate(spyder_robots_blocked_total[5m]) / rate(spyder_tasks_total[5m]) > 0.5
Go Runtime Metrics
SPYDER automatically exposes Go runtime metrics:
Memory Metrics
# Memory usage
go_memstats_alloc_bytes
go_memstats_heap_alloc_bytes
go_memstats_heap_inuse_bytes
go_memstats_heap_sys_bytes
# Garbage collection
go_memstats_gc_sys_bytes
go_gc_duration_seconds
Goroutine Metrics
# Active goroutines
go_goroutines
# Thread count
go_threads
Query Examples:
# Memory growth rate
rate(go_memstats_alloc_bytes[5m])
# GC frequency
rate(go_gc_duration_seconds_count[5m])
# Goroutine count per worker (approximately)
go_goroutines / 256 # assuming 256 workers
HTTP Metrics (from Prometheus client)
Request Metrics
# HTTP requests to metrics endpoint
promhttp_metric_handler_requests_total
promhttp_metric_handler_requests_in_flight_gauge
# Response times
promhttp_metric_handler_request_duration_seconds
Custom Dashboards
Grafana Dashboard Configuration
Main Dashboard Panels:
{
"dashboard": {
"title": "SPYDER Probe Monitoring",
"panels": [
{
"title": "Processing Rate",
"type": "stat",
"targets": [
{
"expr": "rate(spyder_tasks_total[5m])",
"legendFormat": "domains/sec"
}
]
},
{
"title": "Edge Discovery by Type",
"type": "piechart",
"targets": [
{
"expr": "sum by (type) (spyder_edges_total)",
"legendFormat": "{{type}}"
}
]
},
{
"title": "Success Rate",
"type": "stat",
"targets": [
{
"expr": "rate(spyder_tasks_total{status=\"ok\"}[5m]) / rate(spyder_tasks_total[5m]) * 100",
"legendFormat": "success %"
}
]
},
{
"title": "Memory Usage",
"type": "graph",
"targets": [
{
"expr": "go_memstats_heap_alloc_bytes / 1024 / 1024",
"legendFormat": "Heap MB"
}
]
}
]
}
}
Alerting Rules
Prometheus Alerting Rules:
groups:
- name: spyder.rules
rules:
- alert: SpyderHighErrorRate
expr: rate(spyder_tasks_total{status="error"}[5m]) / rate(spyder_tasks_total[5m]) > 0.05
for: 2m
labels:
severity: warning
annotations:
summary: "SPYDER error rate is high"
description: "Error rate is {{ $value | humanizePercentage }} over the last 5 minutes"
- alert: SpyderLowThroughput
expr: rate(spyder_tasks_total[5m]) < 10
for: 5m
labels:
severity: warning
annotations:
summary: "SPYDER throughput is low"
description: "Processing only {{ $value }} domains per second"
- alert: SpyderHighMemoryUsage
expr: go_memstats_heap_alloc_bytes / 1024 / 1024 > 1024
for: 5m
labels:
severity: warning
annotations:
summary: "SPYDER memory usage is high"
description: "Using {{ $value }}MB of heap memory"
- alert: SpyderManyRobotsBlocks
expr: rate(spyder_robots_blocked_total[5m]) / rate(spyder_tasks_total[5m]) > 0.8
for: 10m
labels:
severity: info
annotations:
summary: "Many domains blocked by robots.txt"
description: "{{ $value | humanizePercentage }} of domains blocked"
- alert: SpyderDown
expr: up{job="spyder"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "SPYDER probe is down"
description: "SPYDER probe has been down for more than 1 minute"
Monitoring Best Practices
Scrape Configuration
Prometheus Configuration:
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'spyder'
static_configs:
- targets: ['spyder:9090']
scrape_interval: 30s
metrics_path: /metrics
# Optional: Basic authentication
basic_auth:
username: monitoring
password_file: /etc/prometheus/spyder.password
# Optional: TLS configuration
tls_config:
insecure_skip_verify: false
ca_file: /etc/ssl/certs/ca.pem
Security Considerations
Metrics Endpoint Security:
# Bind to localhost only
-metrics_addr=127.0.0.1:9090
# Use reverse proxy with authentication
nginx_config:
location /metrics {
auth_basic "Prometheus";
auth_basic_user_file /etc/nginx/.htpasswd;
proxy_pass http://127.0.0.1:9090/metrics;
}
Performance Impact
Metrics Collection Overhead:
- CPU: < 1% additional overhead
- Memory: ~1MB for metric storage
- Network: ~1KB/scrape (depends on activity)
High-Frequency Scraping:
# For high-resolution monitoring
scrape_configs:
- job_name: 'spyder-detailed'
static_configs:
- targets: ['spyder:9090']
scrape_interval: 5s # High frequency
metrics_path: /metrics
Troubleshooting
Common Issues:
# Check if metrics endpoint is responding
curl -v http://localhost:9090/metrics
# Verify metric format
curl -s http://localhost:9090/metrics | grep spyder
# Check for parsing errors in Prometheus
curl -s http://prometheus:9090/api/v1/targets
Debug Queries:
# Check for counter resets (restarts)
resets(spyder_tasks_total[1h])
# Verify metric freshness
time() - timestamp(spyder_tasks_total)
# Check for missing metrics
absent(spyder_tasks_total)
This comprehensive metrics setup provides full visibility into SPYDER's performance, health, and operational characteristics for effective monitoring and troubleshooting.