New MQTT adaption
This commit is contained in:
37
Dockerfile
Normal file
37
Dockerfile
Normal file
@ -0,0 +1,37 @@
|
|||||||
|
FROM python:3.11-slim
|
||||||
|
|
||||||
|
# Set metadata
|
||||||
|
LABEL maintainer="mail@hendrikschutter.com"
|
||||||
|
LABEL description="Prometheus exporter for VEGAPULS Air sensors via The Things Network"
|
||||||
|
LABEL version="2.0"
|
||||||
|
|
||||||
|
# Create app directory
|
||||||
|
WORKDIR /app
|
||||||
|
|
||||||
|
# Install dependencies
|
||||||
|
COPY requirements.txt .
|
||||||
|
RUN pip install --no-cache-dir -r requirements.txt
|
||||||
|
|
||||||
|
# Copy application files
|
||||||
|
COPY ttn-vegapuls-exporter.py .
|
||||||
|
COPY config.py .
|
||||||
|
|
||||||
|
# Create non-root user
|
||||||
|
RUN useradd -r -u 1000 -g users exporter && \
|
||||||
|
chown -R exporter:users /app
|
||||||
|
|
||||||
|
# Switch to non-root user
|
||||||
|
USER exporter
|
||||||
|
|
||||||
|
# Expose metrics port
|
||||||
|
EXPOSE 9106
|
||||||
|
|
||||||
|
# Health check
|
||||||
|
HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
|
||||||
|
CMD python -c 'import urllib.request; urllib.request.urlopen("http://localhost:9106/health")' || exit 1
|
||||||
|
|
||||||
|
# Set environment variables
|
||||||
|
ENV PYTHONUNBUFFERED=1
|
||||||
|
|
||||||
|
# Run the exporter
|
||||||
|
CMD ["python", "ttn-vegapuls-exporter.py"]
|
||||||
241
README.md
241
README.md
@ -1,15 +1,232 @@
|
|||||||
# The Things Network Exporter for VEGAPULS Air
|
# TTN VEGAPULS Air Prometheus Exporter
|
||||||
|
|
||||||
Export metrics of a VEGAPULS Air connected via TTN as a prometheus service.
|
A robust Prometheus exporter for VEGAPULS Air sensors connected via The Things Network (TTN). This exporter provides reliable monitoring with automatic reconnection, uplink caching, and timeout detection.
|
||||||
|
|
||||||
## Install ##
|
## Features
|
||||||
|
|
||||||
- `zypper install python311-paho-mqtt`
|
- **Uplink Caching**: Stores historical data with timestamps for each device
|
||||||
- `mkdir /opt/ttn-vegapulsair-exporter/`
|
- **Timeout Detection**: Automatically detects offline sensors (configurable, default 19 hours)
|
||||||
- `cd /opt/ttn-vegapulsair-exporter/`
|
- **Better Error Handling**: Comprehensive logging and error recovery
|
||||||
- import `ttn-vegapulsair-exporter.py` and `config.py`
|
- **Multiple Device Support**: Automatically handles multiple sensors
|
||||||
- Set the constants in `config.py`
|
|
||||||
- `chmod +x /opt/ttn-vegapulsair-exporter/ttn-vegapulsair-exporter.py`
|
## Metrics Exported
|
||||||
- `chown -R prometheus /opt/ttn-vegapulsair-exporter/`
|
|
||||||
- `nano /etc/systemd/system/ttn-vegapulsair-exporter.service`
|
### Exporter Metrics
|
||||||
- `systemctl daemon-reload && systemctl enable --now ttn-vegapulsair-exporter.service`
|
- `vegapulsair_exporter_uptime_seconds` - Exporter uptime in seconds
|
||||||
|
- `vegapulsair_exporter_requests_total` - Total number of metrics requests
|
||||||
|
- `vegapulsair_devices_total` - Total number of known devices
|
||||||
|
- `vegapulsair_devices_online` - Number of currently online devices
|
||||||
|
|
||||||
|
### Per-Device Metrics
|
||||||
|
All device metrics include a `device_id` label:
|
||||||
|
|
||||||
|
#### Status Metrics
|
||||||
|
- `vegapulsair_device_online{device_id="..."}` - Device online status (1=online, 0=offline)
|
||||||
|
- `vegapulsair_last_uplink_seconds_ago{device_id="..."}` - Seconds since last uplink
|
||||||
|
|
||||||
|
#### Sensor Measurements
|
||||||
|
- `vegapulsair_distance_mm{device_id="..."}` - Distance measurement in millimeters
|
||||||
|
- `vegapulsair_temperature_celsius{device_id="..."}` - Temperature in Celsius
|
||||||
|
- `vegapulsair_inclination_degrees{device_id="..."}` - Inclination in degrees
|
||||||
|
- `vegapulsair_linear_percent{device_id="..."}` - Linear percentage
|
||||||
|
- `vegapulsair_percent{device_id="..."}` - Percentage value
|
||||||
|
- `vegapulsair_scaled_value{device_id="..."}` - Scaled measurement value
|
||||||
|
- `vegapulsair_battery_percent{device_id="..."}` - Remaining battery percentage
|
||||||
|
|
||||||
|
#### LoRaWAN Metadata
|
||||||
|
- `vegapulsair_rssi_dbm{device_id="..."}` - RSSI in dBm
|
||||||
|
- `vegapulsair_channel_rssi_dbm{device_id="..."}` - Channel RSSI in dBm
|
||||||
|
- `vegapulsair_snr_db{device_id="..."}` - Signal-to-Noise Ratio in dB
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
- Python 3.7 or higher
|
||||||
|
- `paho-mqtt` library
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
### Option 1: Manual Installation
|
||||||
|
|
||||||
|
1. **Install Python dependencies:**
|
||||||
|
```bash
|
||||||
|
pip install paho-mqtt --break-system-packages
|
||||||
|
# Or use a virtual environment:
|
||||||
|
python3 -m venv venv
|
||||||
|
source venv/bin/activate
|
||||||
|
pip install paho-mqtt
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Create installation directory:**
|
||||||
|
```bash
|
||||||
|
sudo mkdir -p /opt/ttn-vegapuls-exporter
|
||||||
|
cd /opt/ttn-vegapuls-exporter
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Copy files:**
|
||||||
|
```bash
|
||||||
|
sudo cp ttn-vegapuls-exporter.py /opt/ttn-vegapuls-exporter/
|
||||||
|
sudo cp config.py /opt/ttn-vegapuls-exporter/
|
||||||
|
sudo chmod +x /opt/ttn-vegapuls-exporter/ttn-vegapuls-exporter.py
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Configure the exporter:**
|
||||||
|
```bash
|
||||||
|
sudo nano /opt/ttn-vegapuls-exporter/config.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Set the following required parameters:
|
||||||
|
- `ttn_user`: Your TTN application ID (format: `your-app-id@ttn`)
|
||||||
|
- `ttn_key`: Your TTN API key (get from TTN Console)
|
||||||
|
- `ttn_region`: Your TTN region (EU1, NAM1, AU1, etc.)
|
||||||
|
|
||||||
|
5. **Set permissions:**
|
||||||
|
```bash
|
||||||
|
sudo useradd -r -s /bin/false prometheus # If user doesn't exist
|
||||||
|
sudo chown -R prometheus:prometheus /opt/ttn-vegapuls-exporter
|
||||||
|
```
|
||||||
|
|
||||||
|
6. **Install systemd service:**
|
||||||
|
```bash
|
||||||
|
sudo cp ttn-vegapuls-exporter.service /etc/systemd/system/
|
||||||
|
sudo systemctl daemon-reload
|
||||||
|
sudo systemctl enable ttn-vegapuls-exporter.service
|
||||||
|
sudo systemctl start ttn-vegapuls-exporter.service
|
||||||
|
```
|
||||||
|
|
||||||
|
7. **Check status:**
|
||||||
|
```bash
|
||||||
|
sudo systemctl status ttn-vegapuls-exporter.service
|
||||||
|
sudo journalctl -u ttn-vegapuls-exporter.service -f
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option 2: Docker Installation
|
||||||
|
See `docker-compose.yml`.
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Edit `config.py` to customize the exporter:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# HTTP Server configuration
|
||||||
|
hostName = "0.0.0.0" # Listen address
|
||||||
|
serverPort = 9106 # Port for metrics endpoint
|
||||||
|
|
||||||
|
# TTN Configuration
|
||||||
|
ttn_user = "your-app@ttn"
|
||||||
|
ttn_key = "NNSXS...." # From TTN Console
|
||||||
|
ttn_region = "EU1"
|
||||||
|
|
||||||
|
# Timeout configuration
|
||||||
|
sensor_timeout_hours = 19 # Mark sensor offline after N hours
|
||||||
|
|
||||||
|
# Logging
|
||||||
|
log_level = "INFO" # DEBUG, INFO, WARNING, ERROR, CRITICAL
|
||||||
|
```
|
||||||
|
|
||||||
|
### Getting TTN Credentials
|
||||||
|
|
||||||
|
1. Log in to [TTN Console](https://console.cloud.thethings.network/)
|
||||||
|
2. Select your application
|
||||||
|
3. Go to **Integrations** → **MQTT**
|
||||||
|
4. Copy the following:
|
||||||
|
- **Username**: Your application ID (format: `your-app-id@ttn`)
|
||||||
|
- **Password**: Generate an API key with "Read application traffic" permission
|
||||||
|
- **Region**: Your cluster region (visible in the URL, e.g., `eu1`)
|
||||||
|
|
||||||
|
## Prometheus Configuration
|
||||||
|
|
||||||
|
Add to your `prometheus.yml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
scrape_configs:
|
||||||
|
- job_name: 'vegapuls-air'
|
||||||
|
static_configs:
|
||||||
|
- targets: ['localhost:9106']
|
||||||
|
scrape_interval: 60s
|
||||||
|
scrape_timeout: 10s
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example Prometheus Alerts
|
||||||
|
|
||||||
|
See `prometheus-alerts.yml`.
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### No Metrics Appearing
|
||||||
|
|
||||||
|
1. **Check MQTT connection:**
|
||||||
|
```bash
|
||||||
|
sudo journalctl -u ttn-vegapuls-exporter.service | grep MQTT
|
||||||
|
```
|
||||||
|
|
||||||
|
You should see: `Successfully connected to TTN MQTT broker`
|
||||||
|
|
||||||
|
2. **Verify TTN credentials:**
|
||||||
|
- Ensure `ttn_user` format is correct: `your-app-id@ttn`
|
||||||
|
- Verify API key has "Read application traffic" permission
|
||||||
|
- Check region matches your TTN cluster
|
||||||
|
|
||||||
|
3. **Test metrics endpoint:**
|
||||||
|
```bash
|
||||||
|
curl http://localhost:9106/metrics
|
||||||
|
```
|
||||||
|
|
||||||
|
### MQTT Disconnections
|
||||||
|
|
||||||
|
The exporter now handles disconnections automatically with exponential backoff. Check logs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo journalctl -u ttn-vegapuls-exporter.service -f
|
||||||
|
```
|
||||||
|
|
||||||
|
If disconnections persist:
|
||||||
|
- Check network connectivity to TTN
|
||||||
|
- Verify firewall allows outbound port 8883
|
||||||
|
- Ensure system time is correct (TLS certificates)
|
||||||
|
|
||||||
|
### Devices Not Appearing
|
||||||
|
|
||||||
|
1. **Verify devices are sending uplinks:**
|
||||||
|
- Check TTN Console → Applications → Your App → Live Data
|
||||||
|
- Ensure devices are joined and transmitting
|
||||||
|
|
||||||
|
2. **Check user ID:**
|
||||||
|
- `ttn_user` must match your TTN application ID exactly
|
||||||
|
|
||||||
|
3. **Verify payload decoder:**
|
||||||
|
- Devices must have decoded payload in TTN
|
||||||
|
- Check TTN Payload Formatter is configured
|
||||||
|
|
||||||
|
### Debug Mode
|
||||||
|
|
||||||
|
Enable debug logging in `config.py`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
log_level = "DEBUG"
|
||||||
|
```
|
||||||
|
|
||||||
|
This will show:
|
||||||
|
- All MQTT messages received
|
||||||
|
- Cache updates
|
||||||
|
- Device status changes
|
||||||
|
- Detailed error information
|
||||||
|
|
||||||
|
### Data Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
VEGAPULS Air Sensor
|
||||||
|
↓
|
||||||
|
LoRaWAN Gateway
|
||||||
|
↓
|
||||||
|
The Things Network
|
||||||
|
↓
|
||||||
|
MQTT Broker (TLS)
|
||||||
|
↓
|
||||||
|
Exporter (caches data)
|
||||||
|
↓
|
||||||
|
Prometheus (scrapes metrics)
|
||||||
|
```
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
See [LICENSE](LICENSE) file for details.
|
||||||
Binary file not shown.
35
config.py
35
config.py
@ -1,12 +1,39 @@
|
|||||||
#!/usr/bin/env python3
|
#!/usr/bin/env python3
|
||||||
# -*- coding: utf-8 -*-
|
# -*- coding: utf-8 -*-
|
||||||
""" Author: Hendrik Schutter, mail@hendrikschutter.com
|
"""
|
||||||
|
Configuration for TTN VEGAPULS Air Prometheus Exporter
|
||||||
|
Author: Hendrik Schutter, mail@hendrikschutter.com
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
# HTTP Server configuration
|
||||||
hostName = "127.0.0.1"
|
hostName = "127.0.0.1"
|
||||||
serverPort = 9106
|
serverPort = 9106
|
||||||
exporter_prefix = "vegapulsair_"
|
exporter_prefix = "vegapulsair_"
|
||||||
|
|
||||||
ttn_user = "appid@ttn"
|
# TTN MQTT Configuration
|
||||||
ttn_key = "THE APP API KEY FROM TTN CONSOLE"
|
# Get your credentials from TTN Console -> Applications -> Your App -> Integrations -> MQTT
|
||||||
ttn_region = "EU1"
|
ttn_user = "appid@ttn" # Your application ID
|
||||||
|
ttn_key = "THE APP API KEY FROM TTN CONSOLE" # Your API key
|
||||||
|
ttn_region = "EU1" # TTN region: EU1, NAM1, AU1, etc.
|
||||||
|
|
||||||
|
# Integration method: "mqtt" or "http"
|
||||||
|
# - mqtt: Subscribe to TTN MQTT broker (recommended for real-time updates)
|
||||||
|
# - http: Use HTTP Integration webhook (requires TTN webhook configuration)
|
||||||
|
integration_method = "mqtt"
|
||||||
|
|
||||||
|
# Timeout configuration
|
||||||
|
# Time in hours after which a sensor is considered offline if no uplink is received
|
||||||
|
sensor_timeout_hours = 19
|
||||||
|
|
||||||
|
# MQTT specific settings
|
||||||
|
mqtt_keepalive = 60 # MQTT keepalive interval in seconds
|
||||||
|
mqtt_reconnect_delay = 5 # Delay between reconnection attempts in seconds
|
||||||
|
mqtt_reconnect_max_delay = 300 # Maximum delay between reconnection attempts
|
||||||
|
|
||||||
|
# Logging configuration
|
||||||
|
log_level = "INFO" # DEBUG, INFO, WARNING, ERROR, CRITICAL
|
||||||
|
log_format = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
|
||||||
|
|
||||||
|
# Cache configuration
|
||||||
|
cache_cleanup_interval = 3600 # Cleanup old cache entries every hour
|
||||||
|
max_cache_age_hours = 72 # Remove cache entries older than 72 hours
|
||||||
|
|||||||
65
docker-compose.yml
Normal file
65
docker-compose.yml
Normal file
@ -0,0 +1,65 @@
|
|||||||
|
version: '3.8'
|
||||||
|
|
||||||
|
services:
|
||||||
|
ttn-vegapuls-exporter:
|
||||||
|
image: python:3.11-slim
|
||||||
|
container_name: ttn-vegapuls-exporter
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
# Install dependencies and run exporter
|
||||||
|
entrypoint: |
|
||||||
|
sh -c "pip install --no-cache-dir paho-mqtt && python ttn-vegapuls-exporter.py"
|
||||||
|
|
||||||
|
working_dir: /app
|
||||||
|
|
||||||
|
# Expose metrics port
|
||||||
|
ports:
|
||||||
|
- "9106:9106"
|
||||||
|
|
||||||
|
# Mount application files (read-only)
|
||||||
|
volumes:
|
||||||
|
- ./ttn-vegapuls-exporter.py:/app/ttn-vegapuls-exporter.py:ro
|
||||||
|
- ./config.py:/app/config.py:ro
|
||||||
|
|
||||||
|
# Environment variables
|
||||||
|
environment:
|
||||||
|
- PYTHONUNBUFFERED=1
|
||||||
|
|
||||||
|
# Health check
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD-SHELL", "python -c 'import urllib.request; urllib.request.urlopen(\"http://localhost:9106/health\")' || exit 1"]
|
||||||
|
interval: 30s
|
||||||
|
timeout: 10s
|
||||||
|
retries: 3
|
||||||
|
start_period: 10s
|
||||||
|
|
||||||
|
# Resource limits
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
memory: 256M
|
||||||
|
cpus: '0.05'
|
||||||
|
reservations:
|
||||||
|
memory: 64M
|
||||||
|
|
||||||
|
# Logging configuration
|
||||||
|
logging:
|
||||||
|
driver: "json-file"
|
||||||
|
options:
|
||||||
|
max-size: "10m"
|
||||||
|
max-file: "3"
|
||||||
|
|
||||||
|
# Network configuration
|
||||||
|
networks:
|
||||||
|
- monitoring
|
||||||
|
|
||||||
|
# Security options
|
||||||
|
security_opt:
|
||||||
|
- no-new-privileges:true
|
||||||
|
|
||||||
|
# Run as non-root user
|
||||||
|
user: "1000:1000"
|
||||||
|
|
||||||
|
networks:
|
||||||
|
monitoring:
|
||||||
|
driver: bridge
|
||||||
204
prometheus-alerts.yml
Normal file
204
prometheus-alerts.yml
Normal file
@ -0,0 +1,204 @@
|
|||||||
|
# Prometheus Alert Rules for VEGAPULS Air Sensors
|
||||||
|
#
|
||||||
|
# Installation:
|
||||||
|
# 1. Copy this file to /etc/prometheus/rules/vegapuls-alerts.yml
|
||||||
|
# 2. Add to prometheus.yml:
|
||||||
|
# rule_files:
|
||||||
|
# - /etc/prometheus/rules/vegapuls-alerts.yml
|
||||||
|
# 3. Reload Prometheus: systemctl reload prometheus
|
||||||
|
|
||||||
|
groups:
|
||||||
|
- name: ttn_vegapuls_air_alerts
|
||||||
|
interval: 60s
|
||||||
|
rules:
|
||||||
|
# === Exporter Health ===
|
||||||
|
|
||||||
|
- alert: VEGAPULSExporterDown
|
||||||
|
expr: up{job="vegapuls-air"} == 0
|
||||||
|
for: 5m
|
||||||
|
labels:
|
||||||
|
severity: critical
|
||||||
|
component: exporter
|
||||||
|
annotations:
|
||||||
|
summary: "VEGAPULS Air exporter is down"
|
||||||
|
description: "The VEGAPULS Air Prometheus exporter has been down for more than 5 minutes. Check the service status."
|
||||||
|
runbook: "Check systemctl status vegapuls-exporter and journalctl -u vegapuls-exporter"
|
||||||
|
|
||||||
|
# === Device Online Status ===
|
||||||
|
|
||||||
|
- alert: VEGAPULSSensorOffline
|
||||||
|
expr: vegapulsair_device_online == 0
|
||||||
|
for: 10m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
component: sensor
|
||||||
|
annotations:
|
||||||
|
summary: "VEGAPULS sensor {{ $labels.device_id }} is offline"
|
||||||
|
description: "Sensor {{ $labels.device_id }} has not sent an uplink for more than 19 hours and is considered offline."
|
||||||
|
runbook: "Check sensor battery, LoRaWAN coverage, and TTN Console for error messages"
|
||||||
|
|
||||||
|
- alert: VEGAPULSSensorMissing
|
||||||
|
expr: |
|
||||||
|
(time() - vegapulsair_last_uplink_seconds_ago) > 86400
|
||||||
|
for: 30m
|
||||||
|
labels:
|
||||||
|
severity: critical
|
||||||
|
component: sensor
|
||||||
|
annotations:
|
||||||
|
summary: "VEGAPULS sensor {{ $labels.device_id }} missing for over 24h"
|
||||||
|
description: "Sensor {{ $labels.device_id }} has not transmitted for over 24 hours. Last uplink: {{ $value | humanizeDuration }} ago."
|
||||||
|
runbook: "Physical inspection required. Check sensor power and installation."
|
||||||
|
|
||||||
|
# === Battery Monitoring ===
|
||||||
|
|
||||||
|
- alert: VEGAPULSBatteryCritical
|
||||||
|
expr: vegapulsair_battery_percent < 10
|
||||||
|
for: 1h
|
||||||
|
labels:
|
||||||
|
severity: critical
|
||||||
|
component: battery
|
||||||
|
annotations:
|
||||||
|
summary: "VEGAPULS sensor {{ $labels.device_id }} battery critically low"
|
||||||
|
description: "Battery level at {{ $value }}%. Sensor will stop functioning soon. Immediate replacement required."
|
||||||
|
runbook: "Schedule urgent battery replacement"
|
||||||
|
|
||||||
|
- alert: VEGAPULSBatteryLow
|
||||||
|
expr: vegapulsair_battery_percent < 20
|
||||||
|
for: 6h
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
component: battery
|
||||||
|
annotations:
|
||||||
|
summary: "VEGAPULS sensor {{ $labels.device_id }} battery low"
|
||||||
|
description: "Battery level at {{ $value }}%. Plan battery replacement soon."
|
||||||
|
runbook: "Schedule battery replacement within 2-4 weeks"
|
||||||
|
|
||||||
|
- alert: VEGAPULSBatteryWarning
|
||||||
|
expr: vegapulsair_battery_percent < 30
|
||||||
|
for: 12h
|
||||||
|
labels:
|
||||||
|
severity: info
|
||||||
|
component: battery
|
||||||
|
annotations:
|
||||||
|
summary: "VEGAPULS sensor {{ $labels.device_id }} battery below 30%"
|
||||||
|
description: "Battery level at {{ $value }}%. Monitor and plan replacement."
|
||||||
|
runbook: "Add to maintenance schedule for next quarter"
|
||||||
|
|
||||||
|
# === Signal Quality ===
|
||||||
|
|
||||||
|
- alert: VEGAPULSWeakSignal
|
||||||
|
expr: vegapulsair_rssi_dbm < -120
|
||||||
|
for: 1h
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
component: network
|
||||||
|
annotations:
|
||||||
|
summary: "VEGAPULS sensor {{ $labels.device_id }} has weak signal"
|
||||||
|
description: "RSSI is {{ $value }} dBm (very weak). May indicate coverage issues or antenna problems."
|
||||||
|
runbook: "Check gateway coverage, sensor placement, and antenna connection"
|
||||||
|
|
||||||
|
- alert: VEGAPULSPoorSNR
|
||||||
|
expr: vegapulsair_snr_db < -15
|
||||||
|
for: 1h
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
component: network
|
||||||
|
annotations:
|
||||||
|
summary: "VEGAPULS sensor {{ $labels.device_id }} has poor SNR"
|
||||||
|
description: "Signal-to-Noise Ratio is {{ $value }} dB. Signal quality is degraded."
|
||||||
|
runbook: "Check for interference, gateway issues, or repositioning sensor"
|
||||||
|
|
||||||
|
# === Temperature Monitoring ===
|
||||||
|
|
||||||
|
- alert: VEGAPULSTemperatureExtreme
|
||||||
|
expr: |
|
||||||
|
vegapulsair_temperature_celsius > 60 or
|
||||||
|
vegapulsair_temperature_celsius < -20
|
||||||
|
for: 30m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
component: environment
|
||||||
|
annotations:
|
||||||
|
summary: "VEGAPULS sensor {{ $labels.device_id }} extreme temperature"
|
||||||
|
description: "Temperature is {{ $value }}°C, outside normal operating range."
|
||||||
|
runbook: "Check sensor location and environmental conditions"
|
||||||
|
|
||||||
|
# === Data Quality ===
|
||||||
|
|
||||||
|
- alert: VEGAPULSNoDataReceived
|
||||||
|
expr: |
|
||||||
|
rate(vegapulsair_exporter_requests_total[5m]) > 0 and
|
||||||
|
vegapulsair_devices_total == 0
|
||||||
|
for: 15m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
component: integration
|
||||||
|
annotations:
|
||||||
|
summary: "VEGAPULS exporter receiving no device data"
|
||||||
|
description: "Exporter is running and being scraped, but no device data is available. Check MQTT connection and TTN configuration."
|
||||||
|
runbook: "Check exporter logs, TTN Console live data, and MQTT credentials"
|
||||||
|
|
||||||
|
- alert: VEGAPULSAllDevicesOffline
|
||||||
|
expr: |
|
||||||
|
vegapulsair_devices_total > 0 and
|
||||||
|
vegapulsair_devices_online == 0
|
||||||
|
for: 30m
|
||||||
|
labels:
|
||||||
|
severity: critical
|
||||||
|
component: system
|
||||||
|
annotations:
|
||||||
|
summary: "All VEGAPULS sensors are offline"
|
||||||
|
description: "{{ $value }} devices are registered but none are online. System-wide issue suspected."
|
||||||
|
runbook: "Check TTN gateway status, network connectivity, and power supply"
|
||||||
|
|
||||||
|
# === Performance Monitoring ===
|
||||||
|
|
||||||
|
- alert: VEGAPULSHighScrapeRate
|
||||||
|
expr: rate(vegapulsair_exporter_requests_total[5m]) > 2
|
||||||
|
for: 10m
|
||||||
|
labels:
|
||||||
|
severity: info
|
||||||
|
component: performance
|
||||||
|
annotations:
|
||||||
|
summary: "High scrape rate on VEGAPULS exporter"
|
||||||
|
description: "Prometheus is scraping at {{ $value }} requests/second. Consider increasing scrape_interval."
|
||||||
|
runbook: "Review Prometheus configuration and adjust scrape_interval if needed"
|
||||||
|
|
||||||
|
# === Recording Rules for Easier Querying ===
|
||||||
|
|
||||||
|
- name: vegapuls_air_recording_rules
|
||||||
|
interval: 60s
|
||||||
|
rules:
|
||||||
|
# Battery drain rate (percent per day)
|
||||||
|
- record: vegapulsair_battery_drain_rate_percent_per_day
|
||||||
|
expr: |
|
||||||
|
rate(vegapulsair_battery_percent[7d]) * -86400
|
||||||
|
|
||||||
|
# Average signal strength per device (7 day)
|
||||||
|
- record: vegapulsair_rssi_avg_7d
|
||||||
|
expr: |
|
||||||
|
avg_over_time(vegapulsair_rssi_dbm[7d])
|
||||||
|
|
||||||
|
# Uplink frequency (uplinks per day)
|
||||||
|
- record: vegapulsair_uplink_frequency_per_day
|
||||||
|
expr: |
|
||||||
|
86400 / avg_over_time(vegapulsair_last_uplink_seconds_ago[7d])
|
||||||
|
|
||||||
|
# Device availability percentage (24h)
|
||||||
|
- record: vegapulsair_device_availability_percent_24h
|
||||||
|
expr: |
|
||||||
|
avg_over_time(vegapulsair_device_online[24h]) * 100
|
||||||
|
|
||||||
|
# === Usage Examples ===
|
||||||
|
#
|
||||||
|
# Query battery drain rate:
|
||||||
|
# vegapulsair_battery_drain_rate_percent_per_day
|
||||||
|
#
|
||||||
|
# Query devices with availability < 95%:
|
||||||
|
# vegapulsair_device_availability_percent_24h < 95
|
||||||
|
#
|
||||||
|
# Query average RSSI over 7 days:
|
||||||
|
# vegapulsair_rssi_avg_7d
|
||||||
|
#
|
||||||
|
# Query uplink frequency:
|
||||||
|
# vegapulsair_uplink_frequency_per_day
|
||||||
4
requirements.txt
Normal file
4
requirements.txt
Normal file
@ -0,0 +1,4 @@
|
|||||||
|
# TTN VEGAPULS Air Exporter - Python Dependencies
|
||||||
|
|
||||||
|
# MQTT client for connecting to The Things Network
|
||||||
|
paho-mqtt>=2.0.0,<3.0.0
|
||||||
@ -1,286 +1,595 @@
|
|||||||
#!/usr/bin/env python3
|
#!/usr/bin/env python3
|
||||||
# -*- coding: utf-8 -*-
|
# -*- coding: utf-8 -*-
|
||||||
""" Author: Hendrik Schutter, mail@hendrikschutter.com
|
"""
|
||||||
|
TTN VEGAPULS Air Prometheus Exporter
|
||||||
|
Exports metrics from VEGAPULS Air sensors connected via The Things Network
|
||||||
|
|
||||||
|
Author: Hendrik Schutter, mail@hendrikschutter.com
|
||||||
"""
|
"""
|
||||||
|
|
||||||
from http.server import BaseHTTPRequestHandler, HTTPServer
|
|
||||||
import paho.mqtt.client as mqtt
|
|
||||||
from datetime import datetime, timedelta
|
|
||||||
import threading
|
|
||||||
import time
|
|
||||||
import json
|
|
||||||
import sys
|
import sys
|
||||||
import config
|
import json
|
||||||
|
import time
|
||||||
|
import threading
|
||||||
import logging
|
import logging
|
||||||
import ssl
|
import ssl
|
||||||
|
from datetime import datetime, timedelta
|
||||||
|
from http.server import BaseHTTPRequestHandler, HTTPServer
|
||||||
|
from typing import Dict, Optional, Any
|
||||||
|
|
||||||
|
import paho.mqtt.client as mqtt
|
||||||
|
|
||||||
|
import config
|
||||||
|
|
||||||
|
|
||||||
scrape_healthy = True
|
class SensorDataCache:
|
||||||
startTime = datetime.now()
|
"""Thread-safe cache for sensor uplink data with timeout tracking"""
|
||||||
lastMqttReception = datetime.now()
|
|
||||||
node_metrics = list()
|
|
||||||
mutex = threading.Lock()
|
|
||||||
request_count = 0
|
|
||||||
|
|
||||||
mqtt_client = None
|
def __init__(self, timeout_hours: int = 19):
|
||||||
mqtt_connected = False
|
self._data: Dict[str, Dict[str, Any]] = {}
|
||||||
mqtt_lock = threading.Lock()
|
self._lock = threading.RLock()
|
||||||
|
self.timeout_hours = timeout_hours
|
||||||
|
|
||||||
def monitor_timeout():
|
def update(
|
||||||
global scrape_healthy
|
self, device_id: str, payload: Dict, metadata: list, timestamp: datetime
|
||||||
global lastMqttReception
|
):
|
||||||
global mqtt_connected
|
"""
|
||||||
|
Update cached data for a device
|
||||||
|
|
||||||
while True:
|
Args:
|
||||||
time_since_last_reception = datetime.now() - lastMqttReception
|
device_id: Unique device identifier
|
||||||
if time_since_last_reception > timedelta(hours=config.ttn_timeout):
|
payload: Decoded payload from TTN
|
||||||
with mutex:
|
metadata: RX metadata from TTN
|
||||||
scrape_healthy = False
|
timestamp: Timestamp of the uplink
|
||||||
mqtt_connected = False
|
"""
|
||||||
time.sleep(60) # Check timeout every minute
|
with self._lock:
|
||||||
|
self._data[device_id] = {
|
||||||
|
"payload": payload,
|
||||||
|
"metadata": metadata,
|
||||||
|
"timestamp": timestamp,
|
||||||
|
"is_online": True,
|
||||||
|
}
|
||||||
|
logging.info(f"Updated cache for device {device_id}")
|
||||||
|
|
||||||
def reconnect_mqtt():
|
def get_all_devices(self) -> Dict[str, Dict[str, Any]]:
|
||||||
global mqtt_client
|
"""
|
||||||
global mqtt_connected
|
Get all cached device data
|
||||||
|
|
||||||
while True:
|
Returns:
|
||||||
if not mqtt_connected:
|
Dictionary of device data
|
||||||
with mqtt_lock:
|
"""
|
||||||
try:
|
with self._lock:
|
||||||
if mqtt_client is None:
|
return dict(self._data)
|
||||||
print("MQTT client is None, creating a new client...")
|
|
||||||
mqtt_client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2)
|
|
||||||
mqtt_client.on_connect = on_connect
|
|
||||||
mqtt_client.on_message = on_message
|
|
||||||
mqtt_client.on_disconnect = on_disconnect
|
|
||||||
mqtt_client.username_pw_set(config.ttn_user, config.ttn_key)
|
|
||||||
mqtt_client.tls_set()
|
|
||||||
|
|
||||||
print("Attempting to reconnect to MQTT broker...")
|
def check_timeouts(self):
|
||||||
mqtt_client.connect(
|
"""Check all devices for timeout and mark offline ones"""
|
||||||
config.ttn_region.lower() + ".cloud.thethings.network", 8883, 60
|
with self._lock:
|
||||||
|
now = datetime.now()
|
||||||
|
timeout_threshold = timedelta(hours=self.timeout_hours)
|
||||||
|
|
||||||
|
for device_id, data in self._data.items():
|
||||||
|
time_since_update = now - data["timestamp"]
|
||||||
|
was_online = data["is_online"]
|
||||||
|
data["is_online"] = time_since_update < timeout_threshold
|
||||||
|
|
||||||
|
if was_online and not data["is_online"]:
|
||||||
|
logging.warning(
|
||||||
|
f"Device {device_id} marked as OFFLINE "
|
||||||
|
f"(no uplink for {time_since_update.total_seconds()/3600:.1f} hours)"
|
||||||
)
|
)
|
||||||
except Exception as e:
|
elif not was_online and data["is_online"]:
|
||||||
print(f"MQTT reconnect failed: {e}")
|
logging.info(f"Device {device_id} is back ONLINE")
|
||||||
time.sleep(60) # Retry every 10 seconds
|
|
||||||
|
def cleanup_old_entries(self, max_age_hours: int = 72):
|
||||||
|
"""Remove entries older than max_age_hours"""
|
||||||
|
with self._lock:
|
||||||
|
now = datetime.now()
|
||||||
|
max_age = timedelta(hours=max_age_hours)
|
||||||
|
|
||||||
|
devices_to_remove = [
|
||||||
|
device_id
|
||||||
|
for device_id, data in self._data.items()
|
||||||
|
if now - data["timestamp"] > max_age
|
||||||
|
]
|
||||||
|
|
||||||
|
for device_id in devices_to_remove:
|
||||||
|
del self._data[device_id]
|
||||||
|
logging.info(f"Removed stale cache entry for device {device_id}")
|
||||||
|
|
||||||
|
|
||||||
class RequestHandler(BaseHTTPRequestHandler):
|
class TTNMQTTClient:
|
||||||
def log_message(self, format, *args):
|
"""Manages MQTT connection to TTN with automatic reconnection"""
|
||||||
pass
|
|
||||||
|
|
||||||
def get_metrics(self):
|
def __init__(self, cache: SensorDataCache, config_module):
|
||||||
global request_count
|
self.cache = cache
|
||||||
global node_metrics
|
self.config = config_module
|
||||||
global mutex
|
self.client: Optional[mqtt.Client] = None
|
||||||
mutex.acquire()
|
self.connected = False
|
||||||
self.send_response(200)
|
self._lock = threading.Lock()
|
||||||
self.send_header("Content-type", "text/html")
|
self._should_run = True
|
||||||
self.end_headers()
|
|
||||||
self.wfile.write(
|
|
||||||
bytes(
|
|
||||||
config.exporter_prefix
|
|
||||||
+ "exporter_duration_seconds_sum "
|
|
||||||
+ str(int((datetime.now() - startTime).total_seconds()))
|
|
||||||
+ "\n",
|
|
||||||
"utf-8",
|
|
||||||
)
|
|
||||||
)
|
|
||||||
self.wfile.write(
|
|
||||||
bytes(
|
|
||||||
config.exporter_prefix
|
|
||||||
+ "exporter_request_count "
|
|
||||||
+ str(request_count)
|
|
||||||
+ "\n",
|
|
||||||
"utf-8",
|
|
||||||
)
|
|
||||||
)
|
|
||||||
self.wfile.write(
|
|
||||||
bytes(
|
|
||||||
config.exporter_prefix
|
|
||||||
+ "exporter_scrape_healthy "
|
|
||||||
+ str(int(scrape_healthy))
|
|
||||||
+ "\n",
|
|
||||||
"utf-8",
|
|
||||||
)
|
|
||||||
)
|
|
||||||
|
|
||||||
for metric in node_metrics:
|
# Setup logging
|
||||||
self.wfile.write(bytes(config.exporter_prefix + metric + "\n", "utf-8"))
|
self.logger = logging.getLogger("TTNMQTTClient")
|
||||||
|
|
||||||
mutex.release()
|
def _on_connect(self, client, userdata, flags, reason_code, properties):
|
||||||
|
"""Callback when connected to MQTT broker"""
|
||||||
|
if reason_code == 0:
|
||||||
|
self.logger.info("Successfully connected to TTN MQTT broker")
|
||||||
|
self.connected = True
|
||||||
|
|
||||||
def do_GET(self):
|
# Subscribe to uplink messages
|
||||||
global request_count
|
topic = f"v3/{self.config.ttn_user}/devices/+/up"
|
||||||
request_count += 1
|
client.subscribe(topic, qos=1)
|
||||||
if self.path.startswith("/metrics"):
|
self.logger.info(f"Subscribed to topic: {topic}")
|
||||||
self.get_metrics()
|
|
||||||
else:
|
else:
|
||||||
self.send_response(200)
|
self.logger.error(
|
||||||
self.send_header("Content-type", "text/html")
|
f"Failed to connect to MQTT broker. Reason code: {reason_code}"
|
||||||
self.end_headers()
|
|
||||||
self.wfile.write(bytes("<html>", "utf-8"))
|
|
||||||
self.wfile.write(
|
|
||||||
bytes("<head><title>VEGAPULS Air exporter</title></head>", "utf-8")
|
|
||||||
)
|
)
|
||||||
self.wfile.write(bytes("<body>", "utf-8"))
|
self.connected = False
|
||||||
self.wfile.write(
|
|
||||||
bytes(
|
def _on_disconnect(self, client, userdata, flags, reason_code, properties):
|
||||||
"<h1>ttn-vegapulsair exporter based on data from LoRaWAN TTN node.</h1>",
|
"""Callback when disconnected from MQTT broker"""
|
||||||
"utf-8",
|
self.logger.warning(
|
||||||
|
f"Disconnected from MQTT broker. Reason code: {reason_code}"
|
||||||
|
)
|
||||||
|
self.connected = False
|
||||||
|
|
||||||
|
def _on_message(self, client, userdata, msg):
|
||||||
|
"""Callback when a message is received"""
|
||||||
|
self.logger.debug(f"Uplink message received! {msg.topic}")
|
||||||
|
try:
|
||||||
|
# Parse the JSON payload
|
||||||
|
message_data = json.loads(msg.payload.decode("utf-8"))
|
||||||
|
|
||||||
|
# Extract device information
|
||||||
|
device_id = message_data.get("end_device_ids", {}).get(
|
||||||
|
"device_id", "unknown"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Check if this is an uplink message with decoded payload
|
||||||
|
if "uplink_message" not in message_data:
|
||||||
|
self.logger.debug(f"Ignoring non-uplink message from {device_id}")
|
||||||
|
return
|
||||||
|
|
||||||
|
uplink = message_data["uplink_message"]
|
||||||
|
|
||||||
|
if "decoded_payload" not in uplink:
|
||||||
|
self.logger.warning(f"No decoded payload for device {device_id}")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Update cache with new data
|
||||||
|
self.cache.update(
|
||||||
|
device_id=device_id,
|
||||||
|
payload=uplink["decoded_payload"],
|
||||||
|
metadata=uplink.get("rx_metadata", []),
|
||||||
|
timestamp=datetime.now(),
|
||||||
|
)
|
||||||
|
|
||||||
|
self.logger.debug(f"Processed uplink from device: {device_id}")
|
||||||
|
|
||||||
|
except json.JSONDecodeError as e:
|
||||||
|
self.logger.error(f"Failed to parse MQTT message: {e}")
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Error processing MQTT message: {e}", exc_info=True)
|
||||||
|
|
||||||
|
def _create_client(self):
|
||||||
|
"""Create and configure MQTT client"""
|
||||||
|
client = mqtt.Client(
|
||||||
|
client_id=f"vegapuls-exporter-{int(time.time())}",
|
||||||
|
callback_api_version=mqtt.CallbackAPIVersion.VERSION2,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Set callbacks
|
||||||
|
client.on_connect = self._on_connect
|
||||||
|
client.on_disconnect = self._on_disconnect
|
||||||
|
client.on_message = self._on_message
|
||||||
|
|
||||||
|
# Set credentials
|
||||||
|
client.username_pw_set(self.config.ttn_user, self.config.ttn_key)
|
||||||
|
|
||||||
|
# Configure TLS
|
||||||
|
client.tls_set(cert_reqs=ssl.CERT_REQUIRED, tls_version=ssl.PROTOCOL_TLS_CLIENT)
|
||||||
|
client.tls_insecure_set(False)
|
||||||
|
|
||||||
|
return client
|
||||||
|
|
||||||
|
def connect(self):
|
||||||
|
"""Connect to TTN MQTT broker"""
|
||||||
|
with self._lock:
|
||||||
|
try:
|
||||||
|
if self.client is None:
|
||||||
|
self.client = self._create_client()
|
||||||
|
|
||||||
|
broker_url = f"{self.config.ttn_region.lower()}.cloud.thethings.network"
|
||||||
|
self.logger.info(f"Connecting to MQTT broker: {broker_url}")
|
||||||
|
|
||||||
|
self.client.connect(
|
||||||
|
broker_url, port=8883, keepalive=self.config.mqtt_keepalive
|
||||||
|
)
|
||||||
|
|
||||||
|
# Start the network loop in a separate thread
|
||||||
|
self.client.loop_start()
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Failed to connect to MQTT broker: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def disconnect(self):
|
||||||
|
"""Disconnect from MQTT broker"""
|
||||||
|
with self._lock:
|
||||||
|
if self.client:
|
||||||
|
self.client.loop_stop()
|
||||||
|
self.client.disconnect()
|
||||||
|
self.connected = False
|
||||||
|
self.logger.info("Disconnected from MQTT broker")
|
||||||
|
|
||||||
|
def run_with_reconnect(self):
|
||||||
|
"""Main loop with automatic reconnection"""
|
||||||
|
reconnect_delay = self.config.mqtt_reconnect_delay
|
||||||
|
|
||||||
|
while self._should_run:
|
||||||
|
if not self.connected:
|
||||||
|
self.logger.info("Attempting to connect to MQTT broker...")
|
||||||
|
|
||||||
|
if self.connect():
|
||||||
|
# Reset reconnect delay on successful connection
|
||||||
|
reconnect_delay = self.config.mqtt_reconnect_delay
|
||||||
|
else:
|
||||||
|
# Exponential backoff for reconnection
|
||||||
|
self.logger.warning(
|
||||||
|
f"Reconnection failed. Retrying in {reconnect_delay}s..."
|
||||||
|
)
|
||||||
|
time.sleep(reconnect_delay)
|
||||||
|
reconnect_delay = min(
|
||||||
|
reconnect_delay * 2, self.config.mqtt_reconnect_max_delay
|
||||||
|
)
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Wait a bit before checking connection again
|
||||||
|
time.sleep(10)
|
||||||
|
|
||||||
|
def stop(self):
|
||||||
|
"""Stop the MQTT client"""
|
||||||
|
self._should_run = False
|
||||||
|
self.disconnect()
|
||||||
|
|
||||||
|
|
||||||
|
class MetricsServer:
|
||||||
|
"""HTTP server for Prometheus metrics endpoint"""
|
||||||
|
|
||||||
|
def __init__(self, cache: SensorDataCache, config_module):
|
||||||
|
self.cache = cache
|
||||||
|
self.config = config_module
|
||||||
|
self.start_time = datetime.now()
|
||||||
|
self.request_count = 0
|
||||||
|
self._lock = threading.Lock()
|
||||||
|
|
||||||
|
def _format_metric(
|
||||||
|
self, name: str, value: Any, labels: Dict[str, str] = None
|
||||||
|
) -> str:
|
||||||
|
"""
|
||||||
|
Format a Prometheus metric
|
||||||
|
|
||||||
|
Args:
|
||||||
|
name: Metric name
|
||||||
|
value: Metric value
|
||||||
|
labels: Optional labels dictionary
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Formatted metric string
|
||||||
|
"""
|
||||||
|
metric_name = f"{self.config.exporter_prefix}{name}"
|
||||||
|
|
||||||
|
if labels:
|
||||||
|
label_str = ",".join([f'{k}="{v}"' for k, v in labels.items()])
|
||||||
|
return f"{metric_name}{{{label_str}}} {value}"
|
||||||
|
else:
|
||||||
|
return f"{metric_name} {value}"
|
||||||
|
|
||||||
|
def _generate_metrics(self) -> str:
|
||||||
|
"""Generate all Prometheus metrics"""
|
||||||
|
metrics = []
|
||||||
|
|
||||||
|
# Exporter meta metrics
|
||||||
|
uptime = int((datetime.now() - self.start_time).total_seconds())
|
||||||
|
metrics.append(self._format_metric("exporter_uptime_seconds", uptime))
|
||||||
|
metrics.append(
|
||||||
|
self._format_metric("exporter_requests_total", self.request_count)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Get all device data
|
||||||
|
devices = self.cache.get_all_devices()
|
||||||
|
|
||||||
|
# Overall health metric
|
||||||
|
online_devices = sum(1 for d in devices.values() if d["is_online"])
|
||||||
|
total_devices = len(devices)
|
||||||
|
metrics.append(self._format_metric("devices_total", total_devices))
|
||||||
|
metrics.append(self._format_metric("devices_online", online_devices))
|
||||||
|
|
||||||
|
# Per-device metrics
|
||||||
|
for device_id, data in devices.items():
|
||||||
|
labels = {"device_id": device_id}
|
||||||
|
|
||||||
|
# Device online status (1 = online, 0 = offline/timeout)
|
||||||
|
metrics.append(
|
||||||
|
self._format_metric("device_online", int(data["is_online"]), labels)
|
||||||
|
)
|
||||||
|
|
||||||
|
# Time since last uplink in seconds
|
||||||
|
time_since_uplink = (datetime.now() - data["timestamp"]).total_seconds()
|
||||||
|
metrics.append(
|
||||||
|
self._format_metric(
|
||||||
|
"last_uplink_seconds_ago", int(time_since_uplink), labels
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
self.wfile.write(bytes('<p><a href="/metrics">Metrics</a></p>', "utf-8"))
|
|
||||||
self.wfile.write(bytes("</body>", "utf-8"))
|
|
||||||
self.wfile.write(bytes("</html>", "utf-8"))
|
|
||||||
|
|
||||||
def update_metrics(payload, metadata):
|
payload = data["payload"]
|
||||||
global node_metrics
|
metadata = data["metadata"]
|
||||||
global mutex
|
|
||||||
global scrape_healthy
|
|
||||||
global lastMqttReception
|
|
||||||
|
|
||||||
mutex.acquire()
|
# Sensor measurements
|
||||||
node_metrics.clear()
|
if "Distance" in payload:
|
||||||
|
metrics.append(
|
||||||
|
self._format_metric(
|
||||||
|
"distance_mm", float(payload["Distance"]), labels
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
if "Distance" in payload:
|
if "Temperature" in payload:
|
||||||
node_metrics.append("distance " + str(float(payload["Distance"])))
|
metrics.append(
|
||||||
|
self._format_metric(
|
||||||
|
"temperature_celsius", int(payload["Temperature"]), labels
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
if "Inclination_degree" in payload:
|
if "Inclination_degree" in payload:
|
||||||
node_metrics.append("inclination_degree " + str(int(payload["Inclination_degree"])))
|
metrics.append(
|
||||||
|
self._format_metric(
|
||||||
|
"inclination_degrees",
|
||||||
|
int(payload["Inclination_degree"]),
|
||||||
|
labels,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
if "MvLinProcent" in payload:
|
if "MvLinProcent" in payload:
|
||||||
node_metrics.append("linprocent " + str(int(payload["MvLinProcent"])))
|
metrics.append(
|
||||||
|
self._format_metric(
|
||||||
|
"linear_percent", int(payload["MvLinProcent"]), labels
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
if "MvProcent" in payload:
|
if "MvProcent" in payload:
|
||||||
node_metrics.append("procent " + str(int(payload["MvProcent"])))
|
metrics.append(
|
||||||
|
self._format_metric("percent", int(payload["MvProcent"]), labels)
|
||||||
|
)
|
||||||
|
|
||||||
if "MvScaled" in payload:
|
if "MvScaled" in payload:
|
||||||
node_metrics.append("scaled " + str(float(payload["MvScaled"])))
|
metrics.append(
|
||||||
|
self._format_metric(
|
||||||
|
"scaled_value", float(payload["MvScaled"]), labels
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
if "MvScaledUnit" in payload:
|
if "MvScaledUnit" in payload:
|
||||||
node_metrics.append("scaled_unit " + str(int(payload["MvScaledUnit"])))
|
metrics.append(
|
||||||
|
self._format_metric(
|
||||||
|
"scaled_unit", int(payload["MvScaledUnit"]), labels
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
if "PacketIdentifier" in payload:
|
if "PacketIdentifier" in payload:
|
||||||
node_metrics.append("packet_identifier " + str(int(payload["PacketIdentifier"])))
|
metrics.append(
|
||||||
|
self._format_metric(
|
||||||
|
"packet_identifier", int(payload["PacketIdentifier"]), labels
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
if "RemainingPower" in payload:
|
if "RemainingPower" in payload:
|
||||||
node_metrics.append("remaining_power " + str(int(payload["RemainingPower"])))
|
metrics.append(
|
||||||
|
self._format_metric(
|
||||||
|
"battery_percent", int(payload["RemainingPower"]), labels
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
if "Temperature" in payload:
|
if "Unit" in payload:
|
||||||
node_metrics.append("temperature " + str(int(payload["Temperature"])))
|
metrics.append(
|
||||||
|
self._format_metric("unit", int(payload["Unit"]), labels)
|
||||||
|
)
|
||||||
|
|
||||||
if "Unit" in payload:
|
if "UnitTemperature" in payload:
|
||||||
node_metrics.append("unit " + str(int(payload["Unit"])))
|
metrics.append(
|
||||||
|
self._format_metric(
|
||||||
|
"temperature_unit", int(payload["UnitTemperature"]), labels
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
if "UnitTemperature" in payload:
|
# LoRaWAN metadata
|
||||||
node_metrics.append("temperature_unit " + str(int(payload["UnitTemperature"])))
|
if metadata and len(metadata) > 0:
|
||||||
|
first_gateway = metadata[0]
|
||||||
|
|
||||||
if "rssi" in metadata[0]:
|
if "rssi" in first_gateway:
|
||||||
node_metrics.append("rssi " + str(int(metadata[0]["rssi"])))
|
metrics.append(
|
||||||
|
self._format_metric(
|
||||||
|
"rssi_dbm", int(first_gateway["rssi"]), labels
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
if "channel_rssi" in metadata[0]:
|
if "channel_rssi" in first_gateway:
|
||||||
node_metrics.append("channel_rssi " + str(int(metadata[0]["channel_rssi"])))
|
metrics.append(
|
||||||
|
self._format_metric(
|
||||||
|
"channel_rssi_dbm",
|
||||||
|
int(first_gateway["channel_rssi"]),
|
||||||
|
labels,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
if "snr" in metadata[0]:
|
if "snr" in first_gateway:
|
||||||
node_metrics.append("snr " + str(float(metadata[0]["snr"])))
|
metrics.append(
|
||||||
|
self._format_metric(
|
||||||
|
"snr_db", float(first_gateway["snr"]), labels
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
scrape_healthy = True
|
return "\n".join(metrics) + "\n"
|
||||||
lastMqttReception = datetime.now()
|
|
||||||
mutex.release()
|
|
||||||
|
|
||||||
def on_connect(client, userdata, flags, reason_code, properties):
|
def create_handler(self):
|
||||||
global mqtt_connected
|
"""Create HTTP request handler"""
|
||||||
if reason_code == 0:
|
server_instance = self
|
||||||
print("\nConnected to MQTT: reason_code = " + str(reason_code))
|
|
||||||
mqtt_connected = True
|
|
||||||
elif reason_code > 0:
|
|
||||||
print("\nNot connected to MQTT: reason_code = " + str(reason_code))
|
|
||||||
mqtt_connected = False
|
|
||||||
|
|
||||||
def on_disconnect(client, userdata, flags, reason_code, tmp):
|
class RequestHandler(BaseHTTPRequestHandler):
|
||||||
global mqtt_connected
|
def log_message(self, format, *args):
|
||||||
print(f"Disconnected from MQTT: reason_code = {reason_code}")
|
"""Suppress default logging"""
|
||||||
mqtt_connected = False
|
pass
|
||||||
|
|
||||||
def on_message(mqttc, obj, msg):
|
def do_GET(self):
|
||||||
print("on_message")
|
with server_instance._lock:
|
||||||
global scrape_healthy
|
server_instance.request_count += 1
|
||||||
|
|
||||||
try:
|
if self.path == "/metrics":
|
||||||
parsedJSON = json.loads(msg.payload)
|
self.send_response(200)
|
||||||
print(parsedJSON)
|
self.send_header("Content-Type", "text/plain; charset=utf-8")
|
||||||
uplink_message = parsedJSON["uplink_message"]
|
self.end_headers()
|
||||||
update_metrics(uplink_message["decoded_payload"], uplink_message["rx_metadata"])
|
|
||||||
except Exception as e:
|
|
||||||
with mutex:
|
|
||||||
scrape_healthy = False
|
|
||||||
print(f"Unable to parse uplink: {e}")
|
|
||||||
|
|
||||||
def poll_mqtt(mqtt_client):
|
metrics = server_instance._generate_metrics()
|
||||||
# Start the network loop
|
self.wfile.write(metrics.encode("utf-8"))
|
||||||
mqtt_client.loop_forever()
|
|
||||||
|
|
||||||
def configure_mqtt_client():
|
elif self.path == "/" or self.path == "/health":
|
||||||
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2)
|
self.send_response(200)
|
||||||
client.on_connect = on_connect
|
self.send_header("Content-Type", "text/html; charset=utf-8")
|
||||||
client.on_message = on_message
|
self.end_headers()
|
||||||
client.on_disconnect = on_disconnect
|
|
||||||
|
|
||||||
# Set credentials
|
html = """
|
||||||
client.username_pw_set(config.ttn_user, config.ttn_key)
|
<html>
|
||||||
|
<head><title>VEGAPULS Air Exporter</title></head>
|
||||||
|
<body>
|
||||||
|
<h1>TTN VEGAPULS Air Prometheus Exporter</h1>
|
||||||
|
<p>Exporter for VEGAPULS Air sensors connected via The Things Network</p>
|
||||||
|
<p><a href="/metrics">Metrics</a></p>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
"""
|
||||||
|
self.wfile.write(html.encode("utf-8"))
|
||||||
|
|
||||||
# Set up TLS/SSL
|
else:
|
||||||
client.tls_set(
|
self.send_response(404)
|
||||||
cert_reqs=ssl.CERT_REQUIRED,
|
self.end_headers()
|
||||||
tls_version=ssl.PROTOCOL_TLSv1_2, # Enforce TLS 1.2
|
|
||||||
|
return RequestHandler
|
||||||
|
|
||||||
|
|
||||||
|
class TimeoutMonitor:
|
||||||
|
"""Background thread to monitor device timeouts"""
|
||||||
|
|
||||||
|
def __init__(self, cache: SensorDataCache, config_module):
|
||||||
|
self.cache = cache
|
||||||
|
self.config = config_module
|
||||||
|
self._should_run = True
|
||||||
|
self.logger = logging.getLogger("TimeoutMonitor")
|
||||||
|
|
||||||
|
def run(self):
|
||||||
|
"""Main monitoring loop"""
|
||||||
|
while self._should_run:
|
||||||
|
try:
|
||||||
|
self.cache.check_timeouts()
|
||||||
|
|
||||||
|
# Also cleanup old entries periodically
|
||||||
|
if hasattr(self.config, "cache_cleanup_interval"):
|
||||||
|
self.cache.cleanup_old_entries(self.config.max_cache_age_hours)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Error in timeout monitoring: {e}", exc_info=True)
|
||||||
|
|
||||||
|
# Check every minute
|
||||||
|
time.sleep(60)
|
||||||
|
|
||||||
|
def stop(self):
|
||||||
|
"""Stop the monitor"""
|
||||||
|
self._should_run = False
|
||||||
|
|
||||||
|
|
||||||
|
def setup_logging(config_module):
|
||||||
|
"""Configure logging"""
|
||||||
|
log_level = getattr(logging, config_module.log_level.upper(), logging.INFO)
|
||||||
|
log_format = getattr(
|
||||||
|
config_module,
|
||||||
|
"log_format",
|
||||||
|
"%(asctime)s - %(name)s - %(levelname)s - %(message)s",
|
||||||
|
)
|
||||||
|
|
||||||
|
logging.basicConfig(
|
||||||
|
level=log_level, format=log_format, handlers=[logging.StreamHandler(sys.stdout)]
|
||||||
)
|
)
|
||||||
client.tls_insecure_set(False) # Enforce strict certificate validation
|
|
||||||
|
|
||||||
return client
|
|
||||||
|
|
||||||
def main():
|
def main():
|
||||||
global mqtt_client
|
"""Main application entry point"""
|
||||||
|
# Setup logging
|
||||||
|
setup_logging(config)
|
||||||
|
logger = logging.getLogger("Main")
|
||||||
|
|
||||||
# Start timeout monitoring thread
|
logger.info("=" * 60)
|
||||||
timeout_thread = threading.Thread(target=monitor_timeout, daemon=True)
|
logger.info("TTN VEGAPULS Air Prometheus Exporter")
|
||||||
timeout_thread.start()
|
logger.info("=" * 60)
|
||||||
|
logger.info(f"Integration Method: {config.integration_method}")
|
||||||
|
logger.info(f"Sensor Timeout: {config.sensor_timeout_hours} hours")
|
||||||
|
logger.info(f"HTTP Server: {config.hostName}:{config.serverPort}")
|
||||||
|
logger.info("=" * 60)
|
||||||
|
|
||||||
# Start MQTT reconnect thread
|
# Create sensor data cache
|
||||||
reconnect_thread = threading.Thread(target=reconnect_mqtt, daemon=True)
|
cache = SensorDataCache(timeout_hours=config.sensor_timeout_hours)
|
||||||
reconnect_thread.start()
|
|
||||||
|
|
||||||
while True:
|
# Start timeout monitor
|
||||||
mqtt_client = configure_mqtt_client()
|
timeout_monitor = TimeoutMonitor(cache, config)
|
||||||
try:
|
monitor_thread = threading.Thread(
|
||||||
# Connect to TTN broker
|
target=timeout_monitor.run, daemon=True, name="TimeoutMonitor"
|
||||||
broker_url = f"{config.ttn_region.lower()}.cloud.thethings.network"
|
)
|
||||||
mqtt_client.connect(broker_url, 8883, 60)
|
monitor_thread.start()
|
||||||
|
logger.info("Started timeout monitor")
|
||||||
|
|
||||||
# Subscribe to all topics
|
# Start MQTT client if configured
|
||||||
mqtt_client.subscribe("#", 1)
|
mqtt_client = None
|
||||||
logging.info(f"Subscribed to all topics.")
|
mqtt_thread = None
|
||||||
|
if config.integration_method.lower() == "mqtt":
|
||||||
|
mqtt_client = TTNMQTTClient(cache, config)
|
||||||
|
mqtt_thread = threading.Thread(
|
||||||
|
target=mqtt_client.run_with_reconnect, daemon=True, name="MQTTClient"
|
||||||
|
)
|
||||||
|
mqtt_thread.start()
|
||||||
|
logger.info("Started MQTT client")
|
||||||
|
else:
|
||||||
|
logger.warning(f"Unsupported integration method: {config.integration_method}")
|
||||||
|
logger.warning("Only 'mqtt' is currently supported")
|
||||||
|
|
||||||
poll_mqtt_thread = threading.Thread(target=poll_mqtt, args=((mqtt_client,)))
|
# Start HTTP server
|
||||||
poll_mqtt_thread.start()
|
metrics_server = MetricsServer(cache, config)
|
||||||
except Exception as e:
|
handler = metrics_server.create_handler()
|
||||||
logging.error(f"Error occurred: {e}")
|
|
||||||
mqtt_client.loop_stop()
|
|
||||||
|
|
||||||
webServer = HTTPServer((config.hostName, config.serverPort), RequestHandler)
|
try:
|
||||||
print("Server started http://%s:%s" % (config.hostName, config.serverPort))
|
http_server = HTTPServer((config.hostName, config.serverPort), handler)
|
||||||
|
logger.info(
|
||||||
|
f"HTTP server started at http://{config.hostName}:{config.serverPort}"
|
||||||
|
)
|
||||||
|
logger.info("Press Ctrl+C to stop")
|
||||||
|
|
||||||
try:
|
http_server.serve_forever()
|
||||||
webServer.serve_forever()
|
|
||||||
except KeyboardInterrupt:
|
except KeyboardInterrupt:
|
||||||
sys.exit(-1)
|
logger.info("\nShutdown requested by user")
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Fatal error: {e}", exc_info=True)
|
||||||
|
finally:
|
||||||
|
# Cleanup
|
||||||
|
logger.info("Shutting down...")
|
||||||
|
|
||||||
|
if mqtt_client:
|
||||||
|
mqtt_client.stop()
|
||||||
|
|
||||||
|
timeout_monitor.stop()
|
||||||
|
|
||||||
|
logger.info("Shutdown complete")
|
||||||
|
sys.exit(0)
|
||||||
|
|
||||||
webServer.server_close()
|
|
||||||
print("Server stopped.")
|
|
||||||
poll_mqtt_thread.join()
|
|
||||||
except Exception as e:
|
|
||||||
print(e)
|
|
||||||
time.sleep(60)
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
main()
|
main()
|
||||||
|
|||||||
@ -1,16 +1,45 @@
|
|||||||
[Unit]
|
[Unit]
|
||||||
Description=TTN Exporter for VEGAPULS Air
|
Description=TTN VEGAPULS Air Prometheus Exporter
|
||||||
After=syslog.target
|
Documentation=https://git.mosad.xyz/localhorst/TTN-VEGAPULS-Air-exporter
|
||||||
After=network.target
|
After=network-online.target
|
||||||
|
Wants=network-online.target
|
||||||
|
|
||||||
[Service]
|
[Service]
|
||||||
Restart=on-failure
|
|
||||||
RestartSec=2s
|
|
||||||
Type=simple
|
Type=simple
|
||||||
User=prometheus
|
User=prometheus
|
||||||
Group=prometheus
|
Group=prometheus
|
||||||
|
|
||||||
|
# Working directory
|
||||||
WorkingDirectory=/opt/ttn-vegapulsair-exporter/
|
WorkingDirectory=/opt/ttn-vegapulsair-exporter/
|
||||||
|
|
||||||
|
# Execution
|
||||||
ExecStart=/usr/bin/python3 /opt/ttn-vegapulsair-exporter/ttn-vegapulsair-exporter.py
|
ExecStart=/usr/bin/python3 /opt/ttn-vegapulsair-exporter/ttn-vegapulsair-exporter.py
|
||||||
|
|
||||||
|
# Restart configuration
|
||||||
|
Restart=always
|
||||||
|
RestartSec=10
|
||||||
|
|
||||||
|
# Logging
|
||||||
|
StandardOutput=journal
|
||||||
|
StandardError=journal
|
||||||
|
SyslogIdentifier=ttn-vegapuls-exporter
|
||||||
|
|
||||||
|
# Security settings
|
||||||
|
NoNewPrivileges=true
|
||||||
|
PrivateTmp=true
|
||||||
|
ProtectSystem=strict
|
||||||
|
ProtectHome=true
|
||||||
|
ReadWritePaths=/opt/ttn-vegapulsair-exporter/
|
||||||
|
ProtectKernelTunables=true
|
||||||
|
ProtectKernelModules=true
|
||||||
|
ProtectControlGroups=true
|
||||||
|
|
||||||
|
# Resource limits
|
||||||
|
MemoryLimit=256M
|
||||||
|
CPUQuota=5%
|
||||||
|
|
||||||
|
# Environment
|
||||||
|
Environment="PYTHONUNBUFFERED=1"
|
||||||
|
|
||||||
[Install]
|
[Install]
|
||||||
WantedBy=multi-user.target
|
WantedBy=multi-user.target
|
||||||
Reference in New Issue
Block a user