OpenTelemetry Tracing
SPYDER supports distributed tracing via OpenTelemetry (OTEL). Traces capture the lifecycle of each domain crawl, from DNS resolution through TLS analysis and HTTP fetching, providing visibility into per-domain processing time and failure modes.
Configuration
Command-Line Flags
| Flag | Default | Description |
|---|---|---|
-otel_endpoint | "" (disabled) | OTLP HTTP endpoint in host:port format |
-otel_insecure | true | Use plain HTTP instead of HTTPS for OTLP |
-otel_service | spyder-probe | Service name reported in traces |
Enable Tracing
# Send traces to a local Jaeger instance
./bin/spyder -domains=domains.txt \
-otel_endpoint=localhost:4318 \
-otel_insecure=true \
-otel_service=spyder-probe
# Send traces to a remote collector with TLS
./bin/spyder -domains=domains.txt \
-otel_endpoint=otel-collector.monitoring.internal:4318 \
-otel_insecure=false \
-otel_service=spyder-prod-us-westDisable Tracing
Tracing is disabled by default. When -otel_endpoint is empty (the default), no trace data is collected or exported, and there is no performance overhead.
# Tracing disabled (default)
./bin/spyder -domains=domains.txt
# Explicitly disabled
./bin/spyder -domains=domains.txt -otel_endpoint=""Configuration via YAML
# config.yaml
otel_endpoint: "jaeger.monitoring.internal:4318"
otel_insecure: false
otel_service: "spyder-production"./bin/spyder -config=config.yaml -domains=domains.txtOTLP HTTP Exporter
SPYDER uses the OTLP HTTP exporter (otlptracehttp) to send trace data. This exporter sends trace spans as Protocol Buffers over HTTP to port 4318 (the standard OTLP HTTP port).
How It Works
The telemetry subsystem initializes the exporter at startup:
// internal/telemetry/otel.go
func Init(ctx context.Context, endpoint, serviceName string, insecure bool) (func(context.Context) error, error) {
if endpoint == "" {
return func(context.Context) error { return nil }, nil
}
clientOpts := []otlptracehttp.Option{otlptracehttp.WithEndpoint(endpoint)}
if insecure {
clientOpts = append(clientOpts, otlptracehttp.WithInsecure())
}
exp, err := otlptracehttp.New(ctx, clientOpts...)
// ...
tp := trace.NewTracerProvider(
trace.WithBatcher(exp, trace.WithBatchTimeout(3*time.Second)),
trace.WithResource(res),
)
otel.SetTracerProvider(tp)
return tp.Shutdown, nil
}Key details:
- Protocol: OTLP over HTTP (not gRPC)
- Port: 4318 is the standard OTLP HTTP receiver port
- Batching: Traces are batched with a 3-second flush timeout, reducing network overhead
- Resource attributes: Each trace includes the
service.nameattribute set from-otel_service - Shutdown: On graceful shutdown (SIGINT/SIGTERM), pending spans are flushed before exit
Endpoint Format
The -otel_endpoint value should be host:port without a scheme prefix. The scheme is determined by -otel_insecure:
# Correct: host:port only
-otel_endpoint=jaeger:4318
-otel_endpoint=otel-collector.monitoring.svc.cluster.local:4318
# Incorrect: do not include http:// or https://
# -otel_endpoint=http://jaeger:4318 (wrong)Trace Spans
CrawlOne Span
SPYDER creates one trace span per domain crawled. The CrawlOne span in the probe package wraps the entire processing pipeline for a single domain:
func (p *Probe) CrawlOne(ctx context.Context, host string) {
tr := otel.Tracer("spyder/probe")
ctx, span := tr.Start(ctx, "CrawlOne")
defer span.End()
// ... DNS resolution, HTTP fetch, TLS analysis, link extraction
}Each CrawlOne span encompasses:
- DNS resolution: A, AAAA, CNAME, NS, and MX record lookups
- Robots.txt check: Fetch and evaluate robots.txt policy
- Per-host rate limiting: Wait for rate limiter clearance
- HTTP GET: Fetch the root page (with 15-second timeout)
- Link extraction: Parse HTML and extract external links
- TLS certificate fetch: Retrieve and analyze the TLS certificate
- Deduplication checks: Filter previously-seen nodes and edges
- Batch emission: Send discovered data to the output channel
Span Attributes
The CrawlOne span is created under the spyder/probe tracer with the operation name CrawlOne. The span's context propagates to all child operations, so any instrumented HTTP clients or DNS resolvers within the call tree will appear as child spans.
What Traces Reveal
A typical CrawlOne trace shows:
- Total duration: How long the entire domain crawl took
- DNS latency: Time spent resolving DNS records
- HTTP latency: Time for the HTTP GET request (up to 15-second timeout)
- TLS handshake time: Duration of TLS certificate retrieval
- Error information: Any failures are recorded on the span
Slow domains are immediately visible as long-duration spans in Jaeger or Zipkin.
Integration with Jaeger
Local Jaeger Setup
Run Jaeger all-in-one for development:
docker run -d --name jaeger \
-p 16686:16686 \
-p 4318:4318 \
jaegertracing/all-in-one:latestThen point SPYDER at it:
./bin/spyder -domains=domains.txt \
-otel_endpoint=localhost:4318 \
-otel_insecure=true \
-otel_service=spyder-devOpen http://localhost:16686 to view traces.
Finding Slow Domains
In the Jaeger UI:
- Select service
spyder-dev(or your-otel_servicename) - Set operation to
CrawlOne - Sort by duration (descending)
- Click on the longest spans to see where time was spent
Production Jaeger
For production, deploy Jaeger with persistent storage:
# jaeger-values.yaml (Helm)
collector:
service:
otlp:
http:
name: otlp-http
port: 4318
storage:
type: elasticsearch
options:
es:
server-urls: https://elasticsearch.internal:9200
index-prefix: jaegerIntegration with Zipkin
Zipkin can receive OTLP traces through an OpenTelemetry Collector that translates the protocol:
# otel-collector-config.yaml
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
exporters:
zipkin:
endpoint: "http://zipkin:9411/api/v2/spans"
service:
pipelines:
traces:
receivers: [otlp]
exporters: [zipkin]Run the collector:
docker run -d --name otel-collector \
-p 4318:4318 \
-v $(pwd)/otel-collector-config.yaml:/etc/otelcol/config.yaml \
otel/opentelemetry-collector:latestThen point SPYDER at the collector:
./bin/spyder -domains=domains.txt \
-otel_endpoint=localhost:4318 \
-otel_insecure=trueTrace Sampling and Performance Impact
Batching Behavior
SPYDER's trace provider uses WithBatcher with a 3-second batch timeout. This means spans are accumulated in memory and sent in batches, reducing the number of HTTP requests to the collector.
Performance Considerations
| Scenario | Overhead | Notes |
|---|---|---|
Tracing disabled (-otel_endpoint="") | None | No spans created, no memory allocated |
| Tracing enabled, low concurrency (< 64) | Negligible | Few spans per second |
| Tracing enabled, high concurrency (256+) | Low (~1-2% CPU) | Batch exporter amortizes network cost |
| Tracing enabled, very high concurrency (1024+) | Moderate | Consider sampling |
When to Use Sampling
At very high concurrency, every domain crawl produces a span. If you are processing thousands of domains per second, the trace volume may overwhelm your collector. In this case, configure sampling at the collector level:
# otel-collector-config.yaml
processors:
probabilistic_sampler:
sampling_percentage: 10 # Keep 10% of traces
service:
pipelines:
traces:
receivers: [otlp]
processors: [probabilistic_sampler]
exporters: [jaeger]This keeps 10% of traces, which is sufficient for performance analysis while reducing storage and network costs.
Head-Based vs. Tail-Based Sampling
For SPYDER workloads, tail-based sampling is more useful because it can retain traces for slow or errored domains:
processors:
tail_sampling:
policies:
- name: errors
type: status_code
status_code: {status_codes: [ERROR]}
- name: slow
type: latency
latency: {threshold_ms: 5000}
- name: random
type: probabilistic
probabilistic: {sampling_percentage: 5}This keeps all error traces, all traces over 5 seconds, and a 5% random sample of the rest.
Docker Compose Setup
Full Observability Stack
Run SPYDER with Jaeger and Prometheus for complete observability:
# docker-compose.yml
version: "3.8"
services:
spyder:
build: .
command: >
./bin/spyder
-domains=/data/domains.txt
-concurrency=256
-otel_endpoint=jaeger:4318
-otel_insecure=true
-otel_service=spyder-probe
-metrics_addr=:9090
-ingest=https://ingest.example.com/v1/batch
volumes:
- ./domains.txt:/data/domains.txt:ro
depends_on:
- jaeger
networks:
- spyder-net
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "16686:16686" # Jaeger UI
- "4318:4318" # OTLP HTTP
environment:
- COLLECTOR_OTLP_ENABLED=true
networks:
- spyder-net
prometheus:
image: prom/prometheus:latest
ports:
- "9091:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
networks:
- spyder-net
networks:
spyder-net:
driver: bridgePrometheus Configuration
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: spyder
static_configs:
- targets: ["spyder:9090"]
scrape_interval: 30sRun the Stack
# Start all services
docker compose up -d
# View SPYDER logs
docker compose logs -f spyder
# Open Jaeger UI
open http://localhost:16686
# Open Prometheus
open http://localhost:9091With OpenTelemetry Collector
For production-grade setups, add an OpenTelemetry Collector between SPYDER and your backends:
# docker-compose.yml (additional service)
otel-collector:
image: otel/opentelemetry-collector:latest
command: ["--config=/etc/otelcol/config.yaml"]
volumes:
- ./otel-collector-config.yaml:/etc/otelcol/config.yaml:ro
ports:
- "4318:4318"
networks:
- spyder-net# otel-collector-config.yaml
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 5s
send_batch_size: 1024
exporters:
otlp/jaeger:
endpoint: jaeger:4317
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/jaeger]Update the SPYDER service to point at the collector instead of Jaeger directly:
spyder:
command: >
./bin/spyder
-domains=/data/domains.txt
-otel_endpoint=otel-collector:4318
-otel_insecure=trueTroubleshooting
Traces Not Appearing
- Check endpoint format: Use
host:portwithout scheme prefix - Verify connectivity:
curl -v http://jaeger:4318/v1/tracesshould return a response - Check SPYDER logs: Look for
otel init failedwarnings in stderr - Verify collector is running:
docker compose logs jaegerordocker compose logs otel-collector
Missing Spans After Shutdown
SPYDER flushes pending spans on graceful shutdown (SIGINT/SIGTERM). If you kill the process with SIGKILL, buffered spans will be lost. Always use graceful shutdown:
# Correct: sends SIGTERM, allows flush
kill $(pgrep spyder)
# or
docker compose stop spyder
# Incorrect: SIGKILL loses buffered spans
kill -9 $(pgrep spyder)High Collector Load
If the OTEL collector is overwhelmed:
- Increase the batch timeout (currently 3 seconds in SPYDER)
- Add sampling at the collector level
- Scale collector horizontally
- Reduce SPYDER concurrency if trace volume is too high