Skip to content

Monitoring Stack

Monitoring Stack

HookProbe provides comprehensive observability through integrated monitoring tools.

Overview

ComponentPurposePort
PrometheusMetrics collection9090
GrafanaDashboards3000
VictoriaMetricsLong-term storage8428
VictoriaLogsLog aggregation9428
ClickHouseAnalytics (Nexus)8123

Architecture

+---------------------------------------------------------------+
| MONITORING STACK |
+---------------------------------------------------------------+
| |
| +----------------------------------------------------------+ |
| | Grafana | |
| | (Dashboards & Visualization) | |
| +----------------------------+-----------------------------+ |
| | |
| +-----------------+------------------+ |
| | | | |
| v v v |
| +--------------+ +----------------+ +--------------+ |
| | Prometheus | |VictoriaMetrics | |VictoriaLogs | |
| | (Scraping) | | (Storage) | | (Logs) | |
| +------+-------+ +----------------+ +--------------+ |
| | |
| | Scrape |
| v |
| +----------------------------------------------------------+ |
| | Metric Sources | |
| | +--------+ +--------+ +--------+ +--------+ | |
| | | Agent | | Aegis | | Napse | | dnsXai | | |
| | | /8888 | | /9201 | | /9200 | | /9203 | | |
| | +--------+ +--------+ +--------+ +--------+ | |
| +----------------------------------------------------------+ |
| |
+---------------------------------------------------------------+

Prometheus

Configuration

/etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'hookprobe-agent'
static_configs:
- targets: ['localhost:8888']
- job_name: 'aegis'
static_configs:
- targets: ['localhost:9201']
- job_name: 'napse'
static_configs:
- targets: ['localhost:9200']
- job_name: 'xdp'
static_configs:
- targets: ['localhost:9202']

Key Metrics

MetricTypeDescription
qsecbit_scoreGaugeCurrent security score
qsecbit_component_*GaugeIndividual components
aegis_packets_totalCounterPackets processed by Aegis XDP
aegis_packets_droppedCounterPackets dropped by Aegis XDP
aegis_observations_emittedCounterObservations sent to ring buffer
napse_intents_totalCounterNapse intent classifications
napse_flows_totalCounterNapse flow summaries
napse_confidence_avgGaugeAverage intent confidence
dns_queries_totalCounterDNS queries
dns_blocks_totalCounterBlocked queries

Query Examples

# Current QSecBit score
qsecbit_score
# Napse intent rate per minute
rate(napse_intents_total[1m]) * 60
# Aegis XDP drop percentage
rate(aegis_packets_dropped[5m]) / rate(aegis_packets_total[5m]) * 100
# DNS block rate
rate(dns_blocks_total[1h])
# Ring buffer overflow rate (should be 0)
rate(aegis_ringbuf_overflow_total[5m])

Grafana Dashboards

Pre-built Dashboards

DashboardContent
OverviewQSecBit score, intent summary, system health
SecurityNapse intents, threat breakdown, kill chain timeline
NetworkAegis observations, flow analysis, bandwidth
DNSQuery stats, blocks, categories
SystemCPU, memory, disk, containers

Dashboard Panels

QSecBit Panel:

+---------------------------------------------+
| QSecBit Score |
| |
| +-------------------------------------+ |
| | 0.32 (GREEN) | |
| | |||||||......... | |
| +-------------------------------------+ |
| |
| Components: |
| +-- Threats: 0.10 ||||...... |
| +-- Mobile: 0.15 |||||..... |
| +-- IDS: 0.08 |||....... |
| +-- XDP: 0.12 ||||...... |
| +-- dnsXai: 0.18 ||||||.... |
| |
+---------------------------------------------+

Access Grafana

Terminal window
# Default credentials
URL: http://localhost:3000
User: admin
Pass: admin # Change on first login

VictoriaMetrics

Purpose

Long-term metric storage with high compression.

Configuration

# Retention: 1 year
retentionPeriod: 365d
# Storage path
storageDataPath: /var/lib/victoria-metrics

Features

FeatureBenefit
High Compression10x less storage than Prometheus
Fast QueriesOptimized for time-series
PromQL CompatibleUse existing queries
Remote WriteReceive from Prometheus

VictoriaLogs

Purpose

Log aggregation and search.

Log Sources

SourceFormat
Agent logsJSON
Napse intentsNapse Intent JSON
Napse flowsNapse Flow JSON
Aegis statsAegis Observation JSON
System logsSyslog

Query Examples

-- Find high-confidence intent classifications
_stream:{job="napse"} | json | confidence:>0.8
-- DNS blocks in last hour
_stream:{job="dnsxai"} | json | decision:BLOCKED
-- Search for specific domain
_stream:{job="dnsxai"} | domain:"suspicious.com"
-- C2 intents from Napse
_stream:{job="napse"} | json | intent_class:c2

ClickHouse (Nexus)

Purpose

High-performance analytics for large datasets.

Tables

-- QSecBit history
CREATE TABLE qsecbit_history (
timestamp DateTime,
score Float32,
threats Float32,
mobile Float32,
ids Float32,
xdp Float32,
dnsxai Float32
) ENGINE = MergeTree()
ORDER BY timestamp;
-- DNS queries
CREATE TABLE dns_queries (
timestamp DateTime,
domain String,
query_type String,
decision String,
confidence Float32,
category String
) ENGINE = MergeTree()
ORDER BY timestamp;

Query Examples

-- QSecBit trend (hourly)
SELECT
toStartOfHour(timestamp) as hour,
avg(score) as avg_score,
max(score) as max_score
FROM qsecbit_history
WHERE timestamp > now() - INTERVAL 24 HOUR
GROUP BY hour
ORDER BY hour;
-- Top blocked domains
SELECT
domain,
count() as blocks
FROM dns_queries
WHERE decision = 'BLOCKED'
AND timestamp > now() - INTERVAL 7 DAY
GROUP BY domain
ORDER BY blocks DESC
LIMIT 10;

Alerting

Alert Rules

/etc/prometheus/alerts.yml
groups:
- name: hookprobe
rules:
- alert: QSecBitAmber
expr: qsecbit_score > 0.45
for: 1m
labels:
severity: warning
annotations:
summary: "QSecBit entered AMBER zone"
- alert: QSecBitRed
expr: qsecbit_score > 0.70
for: 30s
labels:
severity: critical
annotations:
summary: "QSecBit entered RED zone"
- alert: HighIntentRate
expr: rate(napse_intents_total[5m]) > 10
for: 2m
labels:
severity: warning
annotations:
summary: "High Napse intent classification rate"
- alert: AegisRingbufOverflow
expr: rate(aegis_ringbuf_overflow_total[5m]) > 0
for: 1m
labels:
severity: warning
annotations:
summary: "Aegis ring buffer overflow detected - Napse may be falling behind"

Notification Channels

ChannelConfiguration
EmailSMTP settings
SlackWebhook URL
PagerDutyIntegration key
WebhookCustom URL

Health Endpoints

Agent Health

Terminal window
curl http://localhost:8888/health

Response:

{
"status": "healthy",
"components": {
"agent": "running",
"aegis": "loaded",
"napse": "running",
"xdp": "attached"
},
"uptime_seconds": 86400
}

Metrics Endpoint

Terminal window
curl http://localhost:8888/metrics

CLI Commands

Terminal window
# View current metrics
hookprobe-ctl metrics
# Export metrics
hookprobe-ctl metrics --format prometheus
# View specific metric
hookprobe-ctl metrics --filter qsecbit

Storage Requirements

ComponentStorage/DayRetentionTotal
Prometheus100MB15 days1.5GB
VictoriaMetrics50MB365 days18GB
VictoriaLogs200MB30 days6GB
ClickHouse500MB365 days180GB

Next Steps