SPYDER Probe Pro Documentation
SPYDER (System for Probing and Yielding DNS-based Entity Relations) is a distributed, production-ready probe for mapping inter-domain relationships across the internet.
Quick Navigation
Getting Started
- Quick Start Guide - Get up and running quickly
- Installation Guide - Detailed installation instructions
- CLI Reference - Complete command-line options
- API Reference - Complete API documentation and examples
Architecture & Design
- System Overview - High-level architecture
- Data Model - Nodes and edges specification
- Discovery Pipeline - How SPYDER processes domains
Configuration
- Command Line Interface - All CLI flags and options
- Environment Variables - Runtime configuration
- Redis Configuration - Distributed operations setup
- Security Configuration - mTLS and hardening
Operations
- Single Node Deployment - Production single-server setup
- Monitoring & Observability - Prometheus metrics and monitoring
Development
- Building from Source - Build and compilation guide
Use Cases
- Security Research - Cybersecurity applications
Core Features
- Distributed Architecture: Scales from single nodes to large clusters
- Policy Awareness: Respects robots.txt and excludes sensitive TLDs
- Production Ready: Structured logging, metrics, health checks, graceful shutdown
- Data Reliability: Batch emitter with retries and on-disk spooling
- Security: mTLS support for secure ingestion
- Observability: Prometheus metrics, OpenTelemetry tracing, structured logs
Data Discovery
SPYDER discovers and maps relationships between:
- Domain Names - Web hosts and their apex domains
- IP Addresses - Resolved endpoints and hosting infrastructure
- TLS Certificates - Certificate metadata and SPKI fingerprints
Edge Types
RESOLVES_TO
- Domain → IP address (A/AAAA records)USES_NS
- Domain → Nameserver (NS records)ALIAS_OF
- Domain → CNAME targetUSES_MX
- Domain → Mail exchanger (MX records)LINKS_TO
- Domain → External domains (from HTML links)USES_CERT
- Domain → TLS certificate (SPKI hash)
Quick Start
bash
# Basic usage
echo -e "example.com\ngoogle.com" > domains.txt
./bin/spyder -domains=domains.txt
# With metrics and Redis deduplication
REDIS_ADDR=127.0.0.1:6379 ./bin/spyder \
-domains=domains.txt \
-metrics_addr=:9090 \
-probe=my-probe-1
Production Deployment
bash
# Production configuration
./bin/spyder \
-domains=/opt/spyder/domains.txt \
-ingest=https://ingest.company.com/v1/batch \
-probe=prod-us-west-1 \
-concurrency=256 \
-metrics_addr=127.0.0.1:9090 \
-mtls_cert=/etc/spyder/client.crt \
-mtls_key=/etc/spyder/client.key
Development
bash
# Build from source
git clone https://github.com/gustycube/spyder-probe.git
cd spyder-probe
make build
# Run development server
make run
# Start documentation site
make docs
Community & Support
- Repository: github.com/gustycube/spyder-probe
- Issues: Report bugs and feature requests on GitHub
- Documentation: This comprehensive documentation site
- License: MIT License
Architecture Overview
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Domains │───▶│ Workers │───▶│ Output │
│ (Queue) │ │ (Pool) │ │ (Batches) │
└─────────────┘ └─────────────┘ └─────────────┘
│
┌─────────────┐
│ Components │
│ │
│ • DNS │
│ • HTTP │
│ • TLS │
│ • Extract │
│ • Robots │
│ • Rate Lim │
│ • Dedup │
└─────────────┘
SPYDER processes domains through a pipeline that performs DNS resolution, respects robots.txt policies, fetches HTTP content, analyzes TLS certificates, and extracts external links to map inter-domain relationships.
Status: Production Ready
Documentation Site: Powered by VitePress
Development: npm install && npm run docs:dev
to serve locally