Monitoring Stack
Monitoring Stack
HookProbe provides comprehensive observability through integrated monitoring tools.
Overview
| Component | Purpose | Port |
|---|---|---|
| Prometheus | Metrics collection | 9090 |
| Grafana | Dashboards | 3000 |
| VictoriaMetrics | Long-term storage | 8428 |
| VictoriaLogs | Log aggregation | 9428 |
| ClickHouse | Analytics (Nexus) | 8123 |
Architecture
+---------------------------------------------------------------+| MONITORING STACK |+---------------------------------------------------------------+| || +----------------------------------------------------------+ || | Grafana | || | (Dashboards & Visualization) | || +----------------------------+-----------------------------+ || | || +-----------------+------------------+ || | | | || v v v || +--------------+ +----------------+ +--------------+ || | Prometheus | |VictoriaMetrics | |VictoriaLogs | || | (Scraping) | | (Storage) | | (Logs) | || +------+-------+ +----------------+ +--------------+ || | || | Scrape || v || +----------------------------------------------------------+ || | Metric Sources | || | +--------+ +--------+ +--------+ +--------+ | || | | Agent | | Aegis | | Napse | | dnsXai | | || | | /8888 | | /9201 | | /9200 | | /9203 | | || | +--------+ +--------+ +--------+ +--------+ | || +----------------------------------------------------------+ || |+---------------------------------------------------------------+Prometheus
Configuration
global: scrape_interval: 15s evaluation_interval: 15s
scrape_configs: - job_name: 'hookprobe-agent' static_configs: - targets: ['localhost:8888']
- job_name: 'aegis' static_configs: - targets: ['localhost:9201']
- job_name: 'napse' static_configs: - targets: ['localhost:9200']
- job_name: 'xdp' static_configs: - targets: ['localhost:9202']Key Metrics
| Metric | Type | Description |
|---|---|---|
qsecbit_score | Gauge | Current security score |
qsecbit_component_* | Gauge | Individual components |
aegis_packets_total | Counter | Packets processed by Aegis XDP |
aegis_packets_dropped | Counter | Packets dropped by Aegis XDP |
aegis_observations_emitted | Counter | Observations sent to ring buffer |
napse_intents_total | Counter | Napse intent classifications |
napse_flows_total | Counter | Napse flow summaries |
napse_confidence_avg | Gauge | Average intent confidence |
dns_queries_total | Counter | DNS queries |
dns_blocks_total | Counter | Blocked queries |
Query Examples
# Current QSecBit scoreqsecbit_score
# Napse intent rate per minuterate(napse_intents_total[1m]) * 60
# Aegis XDP drop percentagerate(aegis_packets_dropped[5m]) / rate(aegis_packets_total[5m]) * 100
# DNS block raterate(dns_blocks_total[1h])
# Ring buffer overflow rate (should be 0)rate(aegis_ringbuf_overflow_total[5m])Grafana Dashboards
Pre-built Dashboards
| Dashboard | Content |
|---|---|
| Overview | QSecBit score, intent summary, system health |
| Security | Napse intents, threat breakdown, kill chain timeline |
| Network | Aegis observations, flow analysis, bandwidth |
| DNS | Query stats, blocks, categories |
| System | CPU, memory, disk, containers |
Dashboard Panels
QSecBit Panel:
+---------------------------------------------+| QSecBit Score || || +-------------------------------------+ || | 0.32 (GREEN) | || | |||||||......... | || +-------------------------------------+ || || Components: || +-- Threats: 0.10 ||||...... || +-- Mobile: 0.15 |||||..... || +-- IDS: 0.08 |||....... || +-- XDP: 0.12 ||||...... || +-- dnsXai: 0.18 ||||||.... || |+---------------------------------------------+Access Grafana
# Default credentialsURL: http://localhost:3000User: adminPass: admin # Change on first loginVictoriaMetrics
Purpose
Long-term metric storage with high compression.
Configuration
# Retention: 1 yearretentionPeriod: 365d
# Storage pathstorageDataPath: /var/lib/victoria-metricsFeatures
| Feature | Benefit |
|---|---|
| High Compression | 10x less storage than Prometheus |
| Fast Queries | Optimized for time-series |
| PromQL Compatible | Use existing queries |
| Remote Write | Receive from Prometheus |
VictoriaLogs
Purpose
Log aggregation and search.
Log Sources
| Source | Format |
|---|---|
| Agent logs | JSON |
| Napse intents | Napse Intent JSON |
| Napse flows | Napse Flow JSON |
| Aegis stats | Aegis Observation JSON |
| System logs | Syslog |
Query Examples
-- Find high-confidence intent classifications_stream:{job="napse"} | json | confidence:>0.8
-- DNS blocks in last hour_stream:{job="dnsxai"} | json | decision:BLOCKED
-- Search for specific domain_stream:{job="dnsxai"} | domain:"suspicious.com"
-- C2 intents from Napse_stream:{job="napse"} | json | intent_class:c2ClickHouse (Nexus)
Purpose
High-performance analytics for large datasets.
Tables
-- QSecBit historyCREATE TABLE qsecbit_history ( timestamp DateTime, score Float32, threats Float32, mobile Float32, ids Float32, xdp Float32, dnsxai Float32) ENGINE = MergeTree()ORDER BY timestamp;
-- DNS queriesCREATE TABLE dns_queries ( timestamp DateTime, domain String, query_type String, decision String, confidence Float32, category String) ENGINE = MergeTree()ORDER BY timestamp;Query Examples
-- QSecBit trend (hourly)SELECT toStartOfHour(timestamp) as hour, avg(score) as avg_score, max(score) as max_scoreFROM qsecbit_historyWHERE timestamp > now() - INTERVAL 24 HOURGROUP BY hourORDER BY hour;
-- Top blocked domainsSELECT domain, count() as blocksFROM dns_queriesWHERE decision = 'BLOCKED' AND timestamp > now() - INTERVAL 7 DAYGROUP BY domainORDER BY blocks DESCLIMIT 10;Alerting
Alert Rules
groups: - name: hookprobe rules: - alert: QSecBitAmber expr: qsecbit_score > 0.45 for: 1m labels: severity: warning annotations: summary: "QSecBit entered AMBER zone"
- alert: QSecBitRed expr: qsecbit_score > 0.70 for: 30s labels: severity: critical annotations: summary: "QSecBit entered RED zone"
- alert: HighIntentRate expr: rate(napse_intents_total[5m]) > 10 for: 2m labels: severity: warning annotations: summary: "High Napse intent classification rate"
- alert: AegisRingbufOverflow expr: rate(aegis_ringbuf_overflow_total[5m]) > 0 for: 1m labels: severity: warning annotations: summary: "Aegis ring buffer overflow detected - Napse may be falling behind"Notification Channels
| Channel | Configuration |
|---|---|
| SMTP settings | |
| Slack | Webhook URL |
| PagerDuty | Integration key |
| Webhook | Custom URL |
Health Endpoints
Agent Health
curl http://localhost:8888/healthResponse:
{ "status": "healthy", "components": { "agent": "running", "aegis": "loaded", "napse": "running", "xdp": "attached" }, "uptime_seconds": 86400}Metrics Endpoint
curl http://localhost:8888/metricsCLI Commands
# View current metricshookprobe-ctl metrics
# Export metricshookprobe-ctl metrics --format prometheus
# View specific metrichookprobe-ctl metrics --filter qsecbitStorage Requirements
| Component | Storage/Day | Retention | Total |
|---|---|---|---|
| Prometheus | 100MB | 15 days | 1.5GB |
| VictoriaMetrics | 50MB | 365 days | 18GB |
| VictoriaLogs | 200MB | 30 days | 6GB |
| ClickHouse | 500MB | 365 days | 180GB |
Next Steps
- Debugging - Troubleshooting with metrics
- Configuration - Customize monitoring
- QSecBit - Understanding the score