Command Line Interface Reference
This document provides comprehensive information about SPYDER's command-line flags and configuration options.
Required Parameters
-config
Path to a YAML or JSON configuration file. When provided, all settings are loaded from the file. Command-line flags override individual values from the file. Either -config or -domains must be supplied.
# Load full configuration from file
-config=configs/spyder.yaml
# Override a single value from the file
-config=configs/spyder.yaml -concurrency=512Supported formats: .yaml, .yml, .json
Hot reload: Send SIGHUP to the process to reload the file without restarting:
kill -HUP $(pidof spyder)-domains (Required without -config)
Path to newline-separated file containing domains to probe.
-domains=configs/domains.txtFile Format:
# Comments start with #
example.com
google.com
github.com
# Blank lines are ignored
facebook.com
twitter.comInput Processing:
- Comments (
#) and blank lines are ignored - Domain names are normalized (lowercase, trailing dots removed)
- Duplicates within the file are processed only once
- Maximum line length: 1MB
Core Configuration
-ingest
HTTP(S) endpoint for batch ingestion. If empty, outputs JSON to stdout.
# Send to ingest API
-ingest=https://ingest.example.com/v1/batch
# Output to stdout (default)
-ingest=""API Requirements:
- Must accept POST requests
- Content-Type:
application/json - Returns 2xx status codes for success
- Should handle batch sizes up to 50MB
-probe
Unique identifier for this probe instance.
-probe=us-west-1a
-probe=production-node-01
-probe=dev-localDefault: local-1
Usage:
- Used in metrics labels
- Included in all emitted data
- Should be unique across probe instances
- Helpful for debugging and monitoring
-run
Identifier for the current run/session.
-run=scan-20240101
-run=daily-maintenanceDefault: run-{unix-timestamp}
Usage:
- Groups related discoveries
- Useful for batch processing analysis
- Included in all emitted data
Crawling Mode
-continuous
Enable recursive domain discovery. When enabled, newly discovered domains (from DNS records, TLS certificates, and HTTP links) are fed back into the work queue for further probing.
# Single pass (default)
./bin/spyder -domains=domains.txt
# Recursive crawling
./bin/spyder -domains=domains.txt -continuousDefault: false
Behavior:
- Seed domains from
-domainsfile are probed first - Discovered domains (NS, CNAME, MX, HTTP links) are queued for probing
- Crawling continues until no new domains are discovered or
-max_domainsis reached - Deduplication prevents re-probing already-visited domains
-max_domains
Maximum number of discovered domains to probe in continuous mode. Set to 0 for unlimited.
# Limit to 10,000 discovered domains
./bin/spyder -domains=domains.txt -continuous -max_domains=10000
# Unlimited discovery
./bin/spyder -domains=domains.txt -continuous -max_domains=0Default: 0 (unlimited)
Notes:
- Only applies when
-continuousis enabled - Counts only newly discovered domains, not seed domains
- When the limit is reached, no new domains are queued but in-flight work completes
Output
-output_format
Output format for batch data written to stdout.
-output_format=json # Pretty-printed JSON (default)
-output_format=jsonl # JSON Lines (one object per line)
-output_format=csv # CSV formatDefault: json
Notes:
- Only affects stdout output (not ingest API delivery)
- CSV format outputs edge data only (source, target, type, timestamps)
- JSONL is recommended for piping to other tools
Performance Tuning
-concurrency
Number of concurrent worker goroutines.
-concurrency=512 # High throughput
-concurrency=64 # Conservative
-concurrency=1024 # Maximum throughputDefault: 256
Considerations:
- Higher values increase memory usage
- Limited by system file descriptors
- Network bandwidth becomes bottleneck at high values
- Optimal value depends on target domains' response times
-batch_max_edges
Maximum number of edges per batch before forced flush.
-batch_max_edges=50000 # Large batches
-batch_max_edges=1000 # Small batchesDefault: 10000
Trade-offs:
- Larger batches: Better throughput, higher memory usage
- Smaller batches: Lower latency, more API calls
- Consider ingest API limits
-batch_flush_sec
Time interval (seconds) for batch flushing.
-batch_flush_sec=1 # Low latency
-batch_flush_sec=10 # High throughputDefault: 2
Behavior:
- Forces batch emission after specified seconds
- Prevents indefinite data accumulation
- Balances latency vs. throughput
Content Processing
-ua
User-Agent string for HTTP requests.
-ua="SPYDER-Probe/2.0 (+https://example.com/about)"
-ua="Research-Bot/1.0"Default: SPYDERProbe/1.0 (+https://github.com/gustycube/spyder)
Best Practices:
- Include contact information
- Identify purpose clearly
- Follow RFC 7231 format
- Some sites block generic user agents
-exclude_tlds
Comma-separated list of top-level domains to skip crawling.
-exclude_tlds=gov,mil,int,edu
-exclude_tlds="" # No exclusionsDefault: gov,mil,int
Notes:
- DNS resolution still performed
- Only HTTP crawling is skipped
- Case-insensitive matching
- Subdomain matching (
.govmatcheswww.example.gov)
Reliability & Storage
-spool_dir
Directory for storing failed batch files.
-spool_dir=/var/spool/spyder
-spool_dir=./failed-batchesDefault: spool
Behavior:
- Created automatically if not exists
- Failed batches stored as timestamped JSON files
- Automatic retry on restart
- Files cleaned up after successful transmission
Security & mTLS
-mtls_cert
Path to client certificate file (PEM format) for mTLS authentication.
-mtls_cert=/etc/spyder/client.crtRequirements:
- PEM-encoded X.509 certificate
- Must correspond to
-mtls_key - Used for ingest API authentication
-mtls_key
Path to client private key file (PEM format) for mTLS.
-mtls_key=/etc/spyder/client.keyRequirements:
- PEM-encoded private key
- Must correspond to
-mtls_cert - Should be readable only by spyder process
-mtls_ca
Path to Certificate Authority bundle (PEM format).
-mtls_ca=/etc/spyder/ca-bundle.crtUsage:
- Validates server certificates
- Used when system CA bundle insufficient
- Multiple CA certificates supported
Dashboard & Storage
-dashboard
Enable the live web dashboard served on the metrics port.
-dashboard=true # Enable (default)
-dashboard=false # DisableDefault: true
URL: http://{metrics_addr}/ when enabled.
The dashboard shows a real-time stream of discovered edges, current worker pool size, and probe statistics. It is served on the same port as Prometheus metrics.
-mongodb
MongoDB connection URI for persisting batches to a database in addition to (or instead of) the ingest HTTP endpoint.
-mongodb=mongodb://localhost:27017
-mongodb=mongodb://user:pass@mongo.cluster.local:27017/?authSource=adminDefault: "" (disabled)
Behavior:
- Batches are written to MongoDB alongside any configured ingest endpoint or stdout output.
- Does not replace the ingest endpoint — both run in parallel when both are configured.
-mongodb_db
MongoDB database name used when -mongodb is set.
-mongodb_db=spyder # default
-mongodb_db=spyder_prodDefault: spyder
Observability
-metrics_addr
Listen address for Prometheus metrics endpoint.
-metrics_addr=:9090 # All interfaces
-metrics_addr=127.0.0.1:9090 # Localhost only
-metrics_addr="" # Disable metricsDefault: :9090
Endpoint: http://{metrics_addr}/metrics
Security:
- Consider firewall rules
- May expose sensitive information
- Use localhost binding for security
-otel_endpoint
OpenTelemetry OTLP HTTP endpoint for distributed tracing.
-otel_endpoint=http://jaeger:4318
-otel_endpoint=https://otel-collector.example.com:4318Default: "" (disabled)
Protocol: OTLP over HTTP Port: Typically 4318 for HTTP, 4317 for gRPC
-otel_insecure
Use insecure HTTP for OpenTelemetry (no TLS).
-otel_insecure=true # HTTP
-otel_insecure=false # HTTPSDefault: true
Production Recommendation: Use false with proper TLS
-otel_service
Service name for OpenTelemetry traces.
-otel_service=spyder-probe-prod
-otel_service=spyder-devDefault: spyder-probe
Complete Example Configurations
Development Setup
./bin/spyder \
-domains=test-domains.txt \
-concurrency=32 \
-metrics_addr=:9090 \
-probe=dev-local \
-run=test-$(date +%s)Production Single Node
./bin/spyder \
-domains=/opt/spyder/domains.txt \
-ingest=https://ingest.production.com/v1/batch \
-probe=prod-us-west-1 \
-concurrency=256 \
-metrics_addr=127.0.0.1:9090 \
-spool_dir=/opt/spyder/spool \
-ua="CompanySpyder/1.0 (+https://company.com/contact)" \
-mtls_cert=/opt/spyder/certs/client.crt \
-mtls_key=/opt/spyder/certs/client.key \
-mtls_ca=/opt/spyder/certs/ca.crtHigh-Throughput Configuration
./bin/spyder \
-domains=large-domain-list.txt \
-ingest=https://high-throughput-ingest.com/v1/batch \
-probe=htp-node-01 \
-concurrency=1024 \
-batch_max_edges=50000 \
-batch_flush_sec=1 \
-metrics_addr=:9090Recursive Crawling
./bin/spyder \
-domains=seed-domains.txt \
-continuous \
-max_domains=50000 \
-concurrency=256 \
-probe=crawler-01 \
-metrics_addr=:9090Distributed Setup with Redis
# Environment variables
export REDIS_ADDR=redis.cluster.local:6379
export REDIS_QUEUE_ADDR=redis.cluster.local:6379
export REDIS_QUEUE_KEY=spyder:production:queue
./bin/spyder \
-ingest=https://distributed-ingest.com/v1/batch \
-probe=dist-worker-$(hostname) \
-continuous \
-max_domains=100000 \
-concurrency=256 \
-metrics_addr=:9090 \
-otel_endpoint=http://jaeger.monitoring.svc.cluster.local:4318 \
-otel_service=spyder-productionValidation and Testing
Configuration Validation
Test configuration without processing:
# Test with single domain
echo "example.com" > test.txt
./bin/spyder -domains=test.txt -concurrency=1
# Test metrics endpoint
curl http://localhost:9090/metrics
# Test mTLS configuration
openssl s_client -connect ingest.example.com:443 \
-cert client.crt -key client.key -CAfile ca.crtPerformance Testing
# Memory usage test
echo "$(seq 1 1000 | sed 's/^/test/' | sed 's/$/.example.com/')" > perf-test.txt
/usr/bin/time -v ./bin/spyder -domains=perf-test.txt -concurrency=64
# Throughput test
time ./bin/spyder -domains=large-list.txt -ingest="" | wc -lEnvironment Variable Equivalents
Some flags can be set via environment variables:
# Redis configuration
export REDIS_ADDR=127.0.0.1:6379
# Queue configuration
export REDIS_QUEUE_ADDR=127.0.0.1:6379
export REDIS_QUEUE_KEY=spyder:queue
# OpenTelemetry
export OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318
export OTEL_EXPORTER_OTLP_INSECURE=trueCommon Flag Combinations
Security Research
-exclude_tlds=gov,mil,int,edu
-ua="SecurityResearch/1.0 (+https://university.edu/security)"
-concurrency=64Infrastructure Mapping
-exclude_tlds=gov,mil
-concurrency=512
-batch_max_edges=25000Development/Testing
-concurrency=16
-metrics_addr=127.0.0.1:9090
-otel_insecure=trueTroubleshooting Flags
Debug Single Domain
echo "problem-domain.com" > debug.txt
./bin/spyder -domains=debug.txt -concurrency=1Reduced Resource Usage
./bin/spyder -domains=domains.txt \
-concurrency=32 \
-batch_max_edges=1000 \
-batch_flush_sec=5Maximum Reliability
./bin/spyder -domains=domains.txt \
-spool_dir=/persistent/storage/spool \
-batch_flush_sec=1 \
-mtls_cert=client.crt \
-mtls_key=client.key