Skip to content

Command Line Interface Reference

This document provides comprehensive information about SPYDER's command-line flags and configuration options.

Required Parameters

-config

Path to a YAML or JSON configuration file. When provided, all settings are loaded from the file. Command-line flags override individual values from the file. Either -config or -domains must be supplied.

bash
# Load full configuration from file
-config=configs/spyder.yaml

# Override a single value from the file
-config=configs/spyder.yaml -concurrency=512

Supported formats: .yaml, .yml, .json

Hot reload: Send SIGHUP to the process to reload the file without restarting:

bash
kill -HUP $(pidof spyder)

-domains (Required without -config)

Path to newline-separated file containing domains to probe.

bash
-domains=configs/domains.txt

File Format:

# Comments start with #
example.com
google.com
github.com

# Blank lines are ignored
facebook.com
twitter.com

Input Processing:

  • Comments (#) and blank lines are ignored
  • Domain names are normalized (lowercase, trailing dots removed)
  • Duplicates within the file are processed only once
  • Maximum line length: 1MB

Core Configuration

-ingest

HTTP(S) endpoint for batch ingestion. If empty, outputs JSON to stdout.

bash
# Send to ingest API
-ingest=https://ingest.example.com/v1/batch

# Output to stdout (default)
-ingest=""

API Requirements:

  • Must accept POST requests
  • Content-Type: application/json
  • Returns 2xx status codes for success
  • Should handle batch sizes up to 50MB

-probe

Unique identifier for this probe instance.

bash
-probe=us-west-1a
-probe=production-node-01
-probe=dev-local

Default: local-1

Usage:

  • Used in metrics labels
  • Included in all emitted data
  • Should be unique across probe instances
  • Helpful for debugging and monitoring

-run

Identifier for the current run/session.

bash
-run=scan-20240101
-run=daily-maintenance

Default: run-{unix-timestamp}

Usage:

  • Groups related discoveries
  • Useful for batch processing analysis
  • Included in all emitted data

Crawling Mode

-continuous

Enable recursive domain discovery. When enabled, newly discovered domains (from DNS records, TLS certificates, and HTTP links) are fed back into the work queue for further probing.

bash
# Single pass (default)
./bin/spyder -domains=domains.txt

# Recursive crawling
./bin/spyder -domains=domains.txt -continuous

Default: false

Behavior:

  • Seed domains from -domains file are probed first
  • Discovered domains (NS, CNAME, MX, HTTP links) are queued for probing
  • Crawling continues until no new domains are discovered or -max_domains is reached
  • Deduplication prevents re-probing already-visited domains

-max_domains

Maximum number of discovered domains to probe in continuous mode. Set to 0 for unlimited.

bash
# Limit to 10,000 discovered domains
./bin/spyder -domains=domains.txt -continuous -max_domains=10000

# Unlimited discovery
./bin/spyder -domains=domains.txt -continuous -max_domains=0

Default: 0 (unlimited)

Notes:

  • Only applies when -continuous is enabled
  • Counts only newly discovered domains, not seed domains
  • When the limit is reached, no new domains are queued but in-flight work completes

Output

-output_format

Output format for batch data written to stdout.

bash
-output_format=json    # Pretty-printed JSON (default)
-output_format=jsonl   # JSON Lines (one object per line)
-output_format=csv     # CSV format

Default: json

Notes:

  • Only affects stdout output (not ingest API delivery)
  • CSV format outputs edge data only (source, target, type, timestamps)
  • JSONL is recommended for piping to other tools

Performance Tuning

-concurrency

Number of concurrent worker goroutines.

bash
-concurrency=512    # High throughput
-concurrency=64     # Conservative
-concurrency=1024   # Maximum throughput

Default: 256

Considerations:

  • Higher values increase memory usage
  • Limited by system file descriptors
  • Network bandwidth becomes bottleneck at high values
  • Optimal value depends on target domains' response times

-batch_max_edges

Maximum number of edges per batch before forced flush.

bash
-batch_max_edges=50000   # Large batches
-batch_max_edges=1000    # Small batches

Default: 10000

Trade-offs:

  • Larger batches: Better throughput, higher memory usage
  • Smaller batches: Lower latency, more API calls
  • Consider ingest API limits

-batch_flush_sec

Time interval (seconds) for batch flushing.

bash
-batch_flush_sec=1    # Low latency
-batch_flush_sec=10   # High throughput

Default: 2

Behavior:

  • Forces batch emission after specified seconds
  • Prevents indefinite data accumulation
  • Balances latency vs. throughput

Content Processing

-ua

User-Agent string for HTTP requests.

bash
-ua="SPYDER-Probe/2.0 (+https://example.com/about)"
-ua="Research-Bot/1.0"

Default: SPYDERProbe/1.0 (+https://github.com/gustycube/spyder)

Best Practices:

  • Include contact information
  • Identify purpose clearly
  • Follow RFC 7231 format
  • Some sites block generic user agents

-exclude_tlds

Comma-separated list of top-level domains to skip crawling.

bash
-exclude_tlds=gov,mil,int,edu
-exclude_tlds=""  # No exclusions

Default: gov,mil,int

Notes:

  • DNS resolution still performed
  • Only HTTP crawling is skipped
  • Case-insensitive matching
  • Subdomain matching (.gov matches www.example.gov)

Reliability & Storage

-spool_dir

Directory for storing failed batch files.

bash
-spool_dir=/var/spool/spyder
-spool_dir=./failed-batches

Default: spool

Behavior:

  • Created automatically if not exists
  • Failed batches stored as timestamped JSON files
  • Automatic retry on restart
  • Files cleaned up after successful transmission

Security & mTLS

-mtls_cert

Path to client certificate file (PEM format) for mTLS authentication.

bash
-mtls_cert=/etc/spyder/client.crt

Requirements:

  • PEM-encoded X.509 certificate
  • Must correspond to -mtls_key
  • Used for ingest API authentication

-mtls_key

Path to client private key file (PEM format) for mTLS.

bash
-mtls_key=/etc/spyder/client.key

Requirements:

  • PEM-encoded private key
  • Must correspond to -mtls_cert
  • Should be readable only by spyder process

-mtls_ca

Path to Certificate Authority bundle (PEM format).

bash
-mtls_ca=/etc/spyder/ca-bundle.crt

Usage:

  • Validates server certificates
  • Used when system CA bundle insufficient
  • Multiple CA certificates supported

Dashboard & Storage

-dashboard

Enable the live web dashboard served on the metrics port.

bash
-dashboard=true   # Enable (default)
-dashboard=false  # Disable

Default: true

URL: http://{metrics_addr}/ when enabled.

The dashboard shows a real-time stream of discovered edges, current worker pool size, and probe statistics. It is served on the same port as Prometheus metrics.

-mongodb

MongoDB connection URI for persisting batches to a database in addition to (or instead of) the ingest HTTP endpoint.

bash
-mongodb=mongodb://localhost:27017
-mongodb=mongodb://user:pass@mongo.cluster.local:27017/?authSource=admin

Default: "" (disabled)

Behavior:

  • Batches are written to MongoDB alongside any configured ingest endpoint or stdout output.
  • Does not replace the ingest endpoint — both run in parallel when both are configured.

-mongodb_db

MongoDB database name used when -mongodb is set.

bash
-mongodb_db=spyder          # default
-mongodb_db=spyder_prod

Default: spyder


Observability

-metrics_addr

Listen address for Prometheus metrics endpoint.

bash
-metrics_addr=:9090          # All interfaces
-metrics_addr=127.0.0.1:9090 # Localhost only
-metrics_addr=""             # Disable metrics

Default: :9090

Endpoint: http://{metrics_addr}/metrics

Security:

  • Consider firewall rules
  • May expose sensitive information
  • Use localhost binding for security

-otel_endpoint

OpenTelemetry OTLP HTTP endpoint for distributed tracing.

bash
-otel_endpoint=http://jaeger:4318
-otel_endpoint=https://otel-collector.example.com:4318

Default: "" (disabled)

Protocol: OTLP over HTTP Port: Typically 4318 for HTTP, 4317 for gRPC

-otel_insecure

Use insecure HTTP for OpenTelemetry (no TLS).

bash
-otel_insecure=true   # HTTP
-otel_insecure=false  # HTTPS

Default: true

Production Recommendation: Use false with proper TLS

-otel_service

Service name for OpenTelemetry traces.

bash
-otel_service=spyder-probe-prod
-otel_service=spyder-dev

Default: spyder-probe

Complete Example Configurations

Development Setup

bash
./bin/spyder \
  -domains=test-domains.txt \
  -concurrency=32 \
  -metrics_addr=:9090 \
  -probe=dev-local \
  -run=test-$(date +%s)

Production Single Node

bash
./bin/spyder \
  -domains=/opt/spyder/domains.txt \
  -ingest=https://ingest.production.com/v1/batch \
  -probe=prod-us-west-1 \
  -concurrency=256 \
  -metrics_addr=127.0.0.1:9090 \
  -spool_dir=/opt/spyder/spool \
  -ua="CompanySpyder/1.0 (+https://company.com/contact)" \
  -mtls_cert=/opt/spyder/certs/client.crt \
  -mtls_key=/opt/spyder/certs/client.key \
  -mtls_ca=/opt/spyder/certs/ca.crt

High-Throughput Configuration

bash
./bin/spyder \
  -domains=large-domain-list.txt \
  -ingest=https://high-throughput-ingest.com/v1/batch \
  -probe=htp-node-01 \
  -concurrency=1024 \
  -batch_max_edges=50000 \
  -batch_flush_sec=1 \
  -metrics_addr=:9090

Recursive Crawling

bash
./bin/spyder \
  -domains=seed-domains.txt \
  -continuous \
  -max_domains=50000 \
  -concurrency=256 \
  -probe=crawler-01 \
  -metrics_addr=:9090

Distributed Setup with Redis

bash
# Environment variables
export REDIS_ADDR=redis.cluster.local:6379
export REDIS_QUEUE_ADDR=redis.cluster.local:6379
export REDIS_QUEUE_KEY=spyder:production:queue

./bin/spyder \
  -ingest=https://distributed-ingest.com/v1/batch \
  -probe=dist-worker-$(hostname) \
  -continuous \
  -max_domains=100000 \
  -concurrency=256 \
  -metrics_addr=:9090 \
  -otel_endpoint=http://jaeger.monitoring.svc.cluster.local:4318 \
  -otel_service=spyder-production

Validation and Testing

Configuration Validation

Test configuration without processing:

bash
# Test with single domain
echo "example.com" > test.txt
./bin/spyder -domains=test.txt -concurrency=1

# Test metrics endpoint
curl http://localhost:9090/metrics

# Test mTLS configuration
openssl s_client -connect ingest.example.com:443 \
  -cert client.crt -key client.key -CAfile ca.crt

Performance Testing

bash
# Memory usage test
echo "$(seq 1 1000 | sed 's/^/test/' | sed 's/$/.example.com/')" > perf-test.txt
/usr/bin/time -v ./bin/spyder -domains=perf-test.txt -concurrency=64

# Throughput test
time ./bin/spyder -domains=large-list.txt -ingest=""  | wc -l

Environment Variable Equivalents

Some flags can be set via environment variables:

bash
# Redis configuration
export REDIS_ADDR=127.0.0.1:6379

# Queue configuration  
export REDIS_QUEUE_ADDR=127.0.0.1:6379
export REDIS_QUEUE_KEY=spyder:queue

# OpenTelemetry
export OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318
export OTEL_EXPORTER_OTLP_INSECURE=true

Common Flag Combinations

Security Research

bash
-exclude_tlds=gov,mil,int,edu
-ua="SecurityResearch/1.0 (+https://university.edu/security)"
-concurrency=64

Infrastructure Mapping

bash
-exclude_tlds=gov,mil
-concurrency=512
-batch_max_edges=25000

Development/Testing

bash
-concurrency=16
-metrics_addr=127.0.0.1:9090
-otel_insecure=true

Troubleshooting Flags

Debug Single Domain

bash
echo "problem-domain.com" > debug.txt
./bin/spyder -domains=debug.txt -concurrency=1

Reduced Resource Usage

bash
./bin/spyder -domains=domains.txt \
  -concurrency=32 \
  -batch_max_edges=1000 \
  -batch_flush_sec=5

Maximum Reliability

bash
./bin/spyder -domains=domains.txt \
  -spool_dir=/persistent/storage/spool \
  -batch_flush_sec=1 \
  -mtls_cert=client.crt \
  -mtls_key=client.key