Operations Guide

Configuration

SPYDER can be configured via command-line flags or environment variables:

Core Flags

-domains: Path to newline-separated domain list (required)
-ingest: HTTP(S) ingestion endpoint (optional - prints to stdout if empty)
-probe: Probe identifier (default: "local-1")
-run: Run identifier (default: auto-generated timestamp)
-concurrency: Worker pool size (default: 256)

Rate Limiting

-ua: User-Agent string for HTTP requests
-exclude_tlds: Comma-separated TLDs to skip (default: "gov,mil,int")

Batch Processing

-batch_max_edges: Max edges per batch before flush (default: 10000)
-batch_flush_sec: Timer-based flush interval in seconds (default: 2)
-spool_dir: Directory for failed batch files (default: "spool")

Security

-mtls_cert: Client certificate for mTLS authentication
-mtls_key: Client private key for mTLS authentication
-mtls_ca: CA bundle for mTLS validation

Environment Variables

REDIS_ADDR: Redis server address for deduplication (optional)
REDIS_QUEUE_ADDR: Redis server for distributed queue (optional)
REDIS_QUEUE_KEY: Queue key name (default: "spyder:queue")

Deployment Patterns

Single Node

bash

# Local development
./bin/spyder -domains=domains.txt

# With metrics and Redis dedupe
REDIS_ADDR=127.0.0.1:6379 ./bin/spyder \
  -domains=domains.txt \
  -metrics_addr=:9090

Distributed Queue

bash

# Start queue consumer
REDIS_QUEUE_ADDR=127.0.0.1:6379 ./bin/spyder \
  -metrics_addr=:9090 \
  -probe=worker-1

# Seed the queue
./bin/seed -domains=domains.txt -redis=127.0.0.1:6379

Production with Ingestion

bash

./bin/spyder \
  -domains=domains.txt \
  -ingest=https://ingest.example.com/v1/batch \
  -probe=datacenter-1a \
  -run=scan-$(date +%s) \
  -mtls_cert=/etc/ssl/client.pem \
  -mtls_key=/etc/ssl/client.key \
  -metrics_addr=:9090

Monitoring

Prometheus Metrics (`:9090/metrics`)

spyder_tasks_total{status}: Task completion counters
spyder_edges_total{type}: Edge discovery by relationship type
spyder_robots_blocks_total: Robots.txt enforcement blocks
spyder_http_duration_seconds: HTTP request latency histogram

Structured Logging

JSON-formatted logs include:

level: Log severity (info, warn, error)
msg: Human-readable message
host: Target domain being processed
probe_id: Probe identifier
run_id: Run identifier
err: Error details when applicable

Health Checks

Metrics endpoint: GET /metrics returns 200 if healthy
Process signals: Responds to SIGINT/SIGTERM for graceful shutdown
Spool monitoring: Check spool/ directory for failed batches

Redis Queue (Distributed Scheduling)

Queue Setup

bash

# Enable queue consumption
export REDIS_QUEUE_ADDR=127.0.0.1:6379
export REDIS_QUEUE_KEY=spyder:queue

# Start worker
./bin/spyder -metrics_addr=:9090 -probe=worker-1

Seeding Domains

bash

# Push domains to queue
./bin/seed -domains=domains.txt -redis=127.0.0.1:6379 -key=spyder:queue

Queue Management

Items are leased for 120 seconds during processing
Failed items return to queue automatically

Use Redis commands to inspect queue state:

bash

redis-cli LLEN spyder:queue  # Queue length
redis-cli LRANGE spyder:queue 0 -1  # View items

OpenTelemetry

Configuration

-otel_endpoint: OTLP HTTP endpoint (e.g., "localhost:4318")
-otel_insecure: Use insecure connection (default: true)
-otel_service: Service name (default: "spyder-probe")

Trace Context

CrawlOne span: Complete domain processing pipeline
Custom attributes: probe.id, run.id, domain
Propagates context through DNS, HTTP, and TLS operations

Integration Example

bash

# With Jaeger
./bin/spyder \
  -domains=domains.txt \
  -otel_endpoint=localhost:14268 \
  -otel_service=spyder-prod

Troubleshooting

Common Issues

High Memory Usage

Check deduplication cache size with memory backend
Consider Redis backend for large-scale deployments
Monitor worker pool size vs. available memory

DNS Resolution Failures

Verify network connectivity and DNS servers
Check for rate limiting from upstream DNS providers
Review excluded TLD list for unintended filtering

HTTP Timeouts

Default 20-second timeout per HTTP request
Robots.txt failures don't block crawling (fail-open policy)
Rate limiting prevents overwhelming target servers

Batch Delivery Issues

Check spool/ directory for failed batches
Verify ingestion endpoint availability and authentication
Review mTLS certificate configuration

Performance Tuning

Worker Concurrency

Default: 256 workers
Increase for CPU-bound workloads
Decrease if overwhelming downstream systems

Rate Limiting

Default: 1 request/second per host
Adjust in internal/rate/limiter.go for different patterns
Consider target server capacity and politeness

Batch Sizing

Default: 10,000 edges or 5,000 nodes per batch
Larger batches reduce HTTP overhead
Smaller batches provide faster feedback

Log Analysis

Key Log Patterns

bash

# Filter by probe/run
jq '.probe_id == "worker-1" and .run_id == "scan-123"' logs.jsonl

# Error analysis
jq 'select(.level == "error")' logs.jsonl

# Performance metrics
jq 'select(.msg == "task completed") | .duration' logs.jsonl

Operations Guide ​

Configuration ​

Core Flags ​

Rate Limiting ​

Batch Processing ​

Security ​

Environment Variables ​

Deployment Patterns ​

Single Node ​

Distributed Queue ​

Production with Ingestion ​

Monitoring ​

Prometheus Metrics (:9090/metrics) ​

Structured Logging ​

Health Checks ​

Redis Queue (Distributed Scheduling) ​

Queue Setup ​

Seeding Domains ​

Queue Management ​

OpenTelemetry ​

Configuration ​

Trace Context ​

Integration Example ​

Troubleshooting ​

Common Issues ​

Performance Tuning ​

Log Analysis ​

Operations Guide

Configuration

Core Flags

Rate Limiting

Batch Processing

Security

Environment Variables

Deployment Patterns

Single Node

Distributed Queue

Production with Ingestion

Monitoring

Prometheus Metrics (`:9090/metrics`)

Structured Logging

Health Checks

Redis Queue (Distributed Scheduling)

Queue Setup

Seeding Domains

Queue Management

OpenTelemetry

Configuration

Trace Context

Integration Example

Troubleshooting

Common Issues

Performance Tuning

Log Analysis