New MQTT adaption

This commit is contained in:
2026-01-04 11:07:38 +01:00
parent b28da4739f
commit 16cc9c1cb4
9 changed files with 1135 additions and 243 deletions

37
Dockerfile Normal file
View File

@ -0,0 +1,37 @@
FROM python:3.11-slim
# Set metadata
LABEL maintainer="mail@hendrikschutter.com"
LABEL description="Prometheus exporter for VEGAPULS Air sensors via The Things Network"
LABEL version="2.0"
# Create app directory
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application files
COPY ttn-vegapuls-exporter.py .
COPY config.py .
# Create non-root user
RUN useradd -r -u 1000 -g users exporter && \
chown -R exporter:users /app
# Switch to non-root user
USER exporter
# Expose metrics port
EXPOSE 9106
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
CMD python -c 'import urllib.request; urllib.request.urlopen("http://localhost:9106/health")' || exit 1
# Set environment variables
ENV PYTHONUNBUFFERED=1
# Run the exporter
CMD ["python", "ttn-vegapuls-exporter.py"]

241
README.md
View File

@ -1,15 +1,232 @@
# The Things Network Exporter for VEGAPULS Air # TTN VEGAPULS Air Prometheus Exporter
Export metrics of a VEGAPULS Air connected via TTN as a prometheus service. A robust Prometheus exporter for VEGAPULS Air sensors connected via The Things Network (TTN). This exporter provides reliable monitoring with automatic reconnection, uplink caching, and timeout detection.
## Install ## ## Features
- `zypper install python311-paho-mqtt` - **Uplink Caching**: Stores historical data with timestamps for each device
- `mkdir /opt/ttn-vegapulsair-exporter/` - **Timeout Detection**: Automatically detects offline sensors (configurable, default 19 hours)
- `cd /opt/ttn-vegapulsair-exporter/` - **Better Error Handling**: Comprehensive logging and error recovery
- import `ttn-vegapulsair-exporter.py` and `config.py` - **Multiple Device Support**: Automatically handles multiple sensors
- Set the constants in `config.py`
- `chmod +x /opt/ttn-vegapulsair-exporter/ttn-vegapulsair-exporter.py` ## Metrics Exported
- `chown -R prometheus /opt/ttn-vegapulsair-exporter/`
- `nano /etc/systemd/system/ttn-vegapulsair-exporter.service` ### Exporter Metrics
- `systemctl daemon-reload && systemctl enable --now ttn-vegapulsair-exporter.service` - `vegapulsair_exporter_uptime_seconds` - Exporter uptime in seconds
- `vegapulsair_exporter_requests_total` - Total number of metrics requests
- `vegapulsair_devices_total` - Total number of known devices
- `vegapulsair_devices_online` - Number of currently online devices
### Per-Device Metrics
All device metrics include a `device_id` label:
#### Status Metrics
- `vegapulsair_device_online{device_id="..."}` - Device online status (1=online, 0=offline)
- `vegapulsair_last_uplink_seconds_ago{device_id="..."}` - Seconds since last uplink
#### Sensor Measurements
- `vegapulsair_distance_mm{device_id="..."}` - Distance measurement in millimeters
- `vegapulsair_temperature_celsius{device_id="..."}` - Temperature in Celsius
- `vegapulsair_inclination_degrees{device_id="..."}` - Inclination in degrees
- `vegapulsair_linear_percent{device_id="..."}` - Linear percentage
- `vegapulsair_percent{device_id="..."}` - Percentage value
- `vegapulsair_scaled_value{device_id="..."}` - Scaled measurement value
- `vegapulsair_battery_percent{device_id="..."}` - Remaining battery percentage
#### LoRaWAN Metadata
- `vegapulsair_rssi_dbm{device_id="..."}` - RSSI in dBm
- `vegapulsair_channel_rssi_dbm{device_id="..."}` - Channel RSSI in dBm
- `vegapulsair_snr_db{device_id="..."}` - Signal-to-Noise Ratio in dB
## Requirements
- Python 3.7 or higher
- `paho-mqtt` library
## Installation
### Option 1: Manual Installation
1. **Install Python dependencies:**
```bash
pip install paho-mqtt --break-system-packages
# Or use a virtual environment:
python3 -m venv venv
source venv/bin/activate
pip install paho-mqtt
```
2. **Create installation directory:**
```bash
sudo mkdir -p /opt/ttn-vegapuls-exporter
cd /opt/ttn-vegapuls-exporter
```
3. **Copy files:**
```bash
sudo cp ttn-vegapuls-exporter.py /opt/ttn-vegapuls-exporter/
sudo cp config.py /opt/ttn-vegapuls-exporter/
sudo chmod +x /opt/ttn-vegapuls-exporter/ttn-vegapuls-exporter.py
```
4. **Configure the exporter:**
```bash
sudo nano /opt/ttn-vegapuls-exporter/config.py
```
Set the following required parameters:
- `ttn_user`: Your TTN application ID (format: `your-app-id@ttn`)
- `ttn_key`: Your TTN API key (get from TTN Console)
- `ttn_region`: Your TTN region (EU1, NAM1, AU1, etc.)
5. **Set permissions:**
```bash
sudo useradd -r -s /bin/false prometheus # If user doesn't exist
sudo chown -R prometheus:prometheus /opt/ttn-vegapuls-exporter
```
6. **Install systemd service:**
```bash
sudo cp ttn-vegapuls-exporter.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable ttn-vegapuls-exporter.service
sudo systemctl start ttn-vegapuls-exporter.service
```
7. **Check status:**
```bash
sudo systemctl status ttn-vegapuls-exporter.service
sudo journalctl -u ttn-vegapuls-exporter.service -f
```
### Option 2: Docker Installation
See `docker-compose.yml`.
## Configuration
Edit `config.py` to customize the exporter:
```python
# HTTP Server configuration
hostName = "0.0.0.0" # Listen address
serverPort = 9106 # Port for metrics endpoint
# TTN Configuration
ttn_user = "your-app@ttn"
ttn_key = "NNSXS...." # From TTN Console
ttn_region = "EU1"
# Timeout configuration
sensor_timeout_hours = 19 # Mark sensor offline after N hours
# Logging
log_level = "INFO" # DEBUG, INFO, WARNING, ERROR, CRITICAL
```
### Getting TTN Credentials
1. Log in to [TTN Console](https://console.cloud.thethings.network/)
2. Select your application
3. Go to **Integrations** → **MQTT**
4. Copy the following:
- **Username**: Your application ID (format: `your-app-id@ttn`)
- **Password**: Generate an API key with "Read application traffic" permission
- **Region**: Your cluster region (visible in the URL, e.g., `eu1`)
## Prometheus Configuration
Add to your `prometheus.yml`:
```yaml
scrape_configs:
- job_name: 'vegapuls-air'
static_configs:
- targets: ['localhost:9106']
scrape_interval: 60s
scrape_timeout: 10s
```
### Example Prometheus Alerts
See `prometheus-alerts.yml`.
## Troubleshooting
### No Metrics Appearing
1. **Check MQTT connection:**
```bash
sudo journalctl -u ttn-vegapuls-exporter.service | grep MQTT
```
You should see: `Successfully connected to TTN MQTT broker`
2. **Verify TTN credentials:**
- Ensure `ttn_user` format is correct: `your-app-id@ttn`
- Verify API key has "Read application traffic" permission
- Check region matches your TTN cluster
3. **Test metrics endpoint:**
```bash
curl http://localhost:9106/metrics
```
### MQTT Disconnections
The exporter now handles disconnections automatically with exponential backoff. Check logs:
```bash
sudo journalctl -u ttn-vegapuls-exporter.service -f
```
If disconnections persist:
- Check network connectivity to TTN
- Verify firewall allows outbound port 8883
- Ensure system time is correct (TLS certificates)
### Devices Not Appearing
1. **Verify devices are sending uplinks:**
- Check TTN Console → Applications → Your App → Live Data
- Ensure devices are joined and transmitting
2. **Check user ID:**
- `ttn_user` must match your TTN application ID exactly
3. **Verify payload decoder:**
- Devices must have decoded payload in TTN
- Check TTN Payload Formatter is configured
### Debug Mode
Enable debug logging in `config.py`:
```python
log_level = "DEBUG"
```
This will show:
- All MQTT messages received
- Cache updates
- Device status changes
- Detailed error information
### Data Flow
```
VEGAPULS Air Sensor
LoRaWAN Gateway
The Things Network
MQTT Broker (TLS)
Exporter (caches data)
Prometheus (scrapes metrics)
```
## License
See [LICENSE](LICENSE) file for details.

Binary file not shown.

View File

@ -1,12 +1,39 @@
#!/usr/bin/env python3 #!/usr/bin/env python3
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
""" Author: Hendrik Schutter, mail@hendrikschutter.com """
Configuration for TTN VEGAPULS Air Prometheus Exporter
Author: Hendrik Schutter, mail@hendrikschutter.com
""" """
# HTTP Server configuration
hostName = "127.0.0.1" hostName = "127.0.0.1"
serverPort = 9106 serverPort = 9106
exporter_prefix = "vegapulsair_" exporter_prefix = "vegapulsair_"
ttn_user = "appid@ttn" # TTN MQTT Configuration
ttn_key = "THE APP API KEY FROM TTN CONSOLE" # Get your credentials from TTN Console -> Applications -> Your App -> Integrations -> MQTT
ttn_region = "EU1" ttn_user = "appid@ttn" # Your application ID
ttn_key = "THE APP API KEY FROM TTN CONSOLE" # Your API key
ttn_region = "EU1" # TTN region: EU1, NAM1, AU1, etc.
# Integration method: "mqtt" or "http"
# - mqtt: Subscribe to TTN MQTT broker (recommended for real-time updates)
# - http: Use HTTP Integration webhook (requires TTN webhook configuration)
integration_method = "mqtt"
# Timeout configuration
# Time in hours after which a sensor is considered offline if no uplink is received
sensor_timeout_hours = 19
# MQTT specific settings
mqtt_keepalive = 60 # MQTT keepalive interval in seconds
mqtt_reconnect_delay = 5 # Delay between reconnection attempts in seconds
mqtt_reconnect_max_delay = 300 # Maximum delay between reconnection attempts
# Logging configuration
log_level = "INFO" # DEBUG, INFO, WARNING, ERROR, CRITICAL
log_format = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
# Cache configuration
cache_cleanup_interval = 3600 # Cleanup old cache entries every hour
max_cache_age_hours = 72 # Remove cache entries older than 72 hours

65
docker-compose.yml Normal file
View File

@ -0,0 +1,65 @@
version: '3.8'
services:
ttn-vegapuls-exporter:
image: python:3.11-slim
container_name: ttn-vegapuls-exporter
restart: unless-stopped
# Install dependencies and run exporter
entrypoint: |
sh -c "pip install --no-cache-dir paho-mqtt && python ttn-vegapuls-exporter.py"
working_dir: /app
# Expose metrics port
ports:
- "9106:9106"
# Mount application files (read-only)
volumes:
- ./ttn-vegapuls-exporter.py:/app/ttn-vegapuls-exporter.py:ro
- ./config.py:/app/config.py:ro
# Environment variables
environment:
- PYTHONUNBUFFERED=1
# Health check
healthcheck:
test: ["CMD-SHELL", "python -c 'import urllib.request; urllib.request.urlopen(\"http://localhost:9106/health\")' || exit 1"]
interval: 30s
timeout: 10s
retries: 3
start_period: 10s
# Resource limits
deploy:
resources:
limits:
memory: 256M
cpus: '0.05'
reservations:
memory: 64M
# Logging configuration
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
# Network configuration
networks:
- monitoring
# Security options
security_opt:
- no-new-privileges:true
# Run as non-root user
user: "1000:1000"
networks:
monitoring:
driver: bridge

204
prometheus-alerts.yml Normal file
View File

@ -0,0 +1,204 @@
# Prometheus Alert Rules for VEGAPULS Air Sensors
#
# Installation:
# 1. Copy this file to /etc/prometheus/rules/vegapuls-alerts.yml
# 2. Add to prometheus.yml:
# rule_files:
# - /etc/prometheus/rules/vegapuls-alerts.yml
# 3. Reload Prometheus: systemctl reload prometheus
groups:
- name: ttn_vegapuls_air_alerts
interval: 60s
rules:
# === Exporter Health ===
- alert: VEGAPULSExporterDown
expr: up{job="vegapuls-air"} == 0
for: 5m
labels:
severity: critical
component: exporter
annotations:
summary: "VEGAPULS Air exporter is down"
description: "The VEGAPULS Air Prometheus exporter has been down for more than 5 minutes. Check the service status."
runbook: "Check systemctl status vegapuls-exporter and journalctl -u vegapuls-exporter"
# === Device Online Status ===
- alert: VEGAPULSSensorOffline
expr: vegapulsair_device_online == 0
for: 10m
labels:
severity: warning
component: sensor
annotations:
summary: "VEGAPULS sensor {{ $labels.device_id }} is offline"
description: "Sensor {{ $labels.device_id }} has not sent an uplink for more than 19 hours and is considered offline."
runbook: "Check sensor battery, LoRaWAN coverage, and TTN Console for error messages"
- alert: VEGAPULSSensorMissing
expr: |
(time() - vegapulsair_last_uplink_seconds_ago) > 86400
for: 30m
labels:
severity: critical
component: sensor
annotations:
summary: "VEGAPULS sensor {{ $labels.device_id }} missing for over 24h"
description: "Sensor {{ $labels.device_id }} has not transmitted for over 24 hours. Last uplink: {{ $value | humanizeDuration }} ago."
runbook: "Physical inspection required. Check sensor power and installation."
# === Battery Monitoring ===
- alert: VEGAPULSBatteryCritical
expr: vegapulsair_battery_percent < 10
for: 1h
labels:
severity: critical
component: battery
annotations:
summary: "VEGAPULS sensor {{ $labels.device_id }} battery critically low"
description: "Battery level at {{ $value }}%. Sensor will stop functioning soon. Immediate replacement required."
runbook: "Schedule urgent battery replacement"
- alert: VEGAPULSBatteryLow
expr: vegapulsair_battery_percent < 20
for: 6h
labels:
severity: warning
component: battery
annotations:
summary: "VEGAPULS sensor {{ $labels.device_id }} battery low"
description: "Battery level at {{ $value }}%. Plan battery replacement soon."
runbook: "Schedule battery replacement within 2-4 weeks"
- alert: VEGAPULSBatteryWarning
expr: vegapulsair_battery_percent < 30
for: 12h
labels:
severity: info
component: battery
annotations:
summary: "VEGAPULS sensor {{ $labels.device_id }} battery below 30%"
description: "Battery level at {{ $value }}%. Monitor and plan replacement."
runbook: "Add to maintenance schedule for next quarter"
# === Signal Quality ===
- alert: VEGAPULSWeakSignal
expr: vegapulsair_rssi_dbm < -120
for: 1h
labels:
severity: warning
component: network
annotations:
summary: "VEGAPULS sensor {{ $labels.device_id }} has weak signal"
description: "RSSI is {{ $value }} dBm (very weak). May indicate coverage issues or antenna problems."
runbook: "Check gateway coverage, sensor placement, and antenna connection"
- alert: VEGAPULSPoorSNR
expr: vegapulsair_snr_db < -15
for: 1h
labels:
severity: warning
component: network
annotations:
summary: "VEGAPULS sensor {{ $labels.device_id }} has poor SNR"
description: "Signal-to-Noise Ratio is {{ $value }} dB. Signal quality is degraded."
runbook: "Check for interference, gateway issues, or repositioning sensor"
# === Temperature Monitoring ===
- alert: VEGAPULSTemperatureExtreme
expr: |
vegapulsair_temperature_celsius > 60 or
vegapulsair_temperature_celsius < -20
for: 30m
labels:
severity: warning
component: environment
annotations:
summary: "VEGAPULS sensor {{ $labels.device_id }} extreme temperature"
description: "Temperature is {{ $value }}°C, outside normal operating range."
runbook: "Check sensor location and environmental conditions"
# === Data Quality ===
- alert: VEGAPULSNoDataReceived
expr: |
rate(vegapulsair_exporter_requests_total[5m]) > 0 and
vegapulsair_devices_total == 0
for: 15m
labels:
severity: warning
component: integration
annotations:
summary: "VEGAPULS exporter receiving no device data"
description: "Exporter is running and being scraped, but no device data is available. Check MQTT connection and TTN configuration."
runbook: "Check exporter logs, TTN Console live data, and MQTT credentials"
- alert: VEGAPULSAllDevicesOffline
expr: |
vegapulsair_devices_total > 0 and
vegapulsair_devices_online == 0
for: 30m
labels:
severity: critical
component: system
annotations:
summary: "All VEGAPULS sensors are offline"
description: "{{ $value }} devices are registered but none are online. System-wide issue suspected."
runbook: "Check TTN gateway status, network connectivity, and power supply"
# === Performance Monitoring ===
- alert: VEGAPULSHighScrapeRate
expr: rate(vegapulsair_exporter_requests_total[5m]) > 2
for: 10m
labels:
severity: info
component: performance
annotations:
summary: "High scrape rate on VEGAPULS exporter"
description: "Prometheus is scraping at {{ $value }} requests/second. Consider increasing scrape_interval."
runbook: "Review Prometheus configuration and adjust scrape_interval if needed"
# === Recording Rules for Easier Querying ===
- name: vegapuls_air_recording_rules
interval: 60s
rules:
# Battery drain rate (percent per day)
- record: vegapulsair_battery_drain_rate_percent_per_day
expr: |
rate(vegapulsair_battery_percent[7d]) * -86400
# Average signal strength per device (7 day)
- record: vegapulsair_rssi_avg_7d
expr: |
avg_over_time(vegapulsair_rssi_dbm[7d])
# Uplink frequency (uplinks per day)
- record: vegapulsair_uplink_frequency_per_day
expr: |
86400 / avg_over_time(vegapulsair_last_uplink_seconds_ago[7d])
# Device availability percentage (24h)
- record: vegapulsair_device_availability_percent_24h
expr: |
avg_over_time(vegapulsair_device_online[24h]) * 100
# === Usage Examples ===
#
# Query battery drain rate:
# vegapulsair_battery_drain_rate_percent_per_day
#
# Query devices with availability < 95%:
# vegapulsair_device_availability_percent_24h < 95
#
# Query average RSSI over 7 days:
# vegapulsair_rssi_avg_7d
#
# Query uplink frequency:
# vegapulsair_uplink_frequency_per_day

4
requirements.txt Normal file
View File

@ -0,0 +1,4 @@
# TTN VEGAPULS Air Exporter - Python Dependencies
# MQTT client for connecting to The Things Network
paho-mqtt>=2.0.0,<3.0.0

View File

@ -1,286 +1,595 @@
#!/usr/bin/env python3 #!/usr/bin/env python3
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
""" Author: Hendrik Schutter, mail@hendrikschutter.com """
TTN VEGAPULS Air Prometheus Exporter
Exports metrics from VEGAPULS Air sensors connected via The Things Network
Author: Hendrik Schutter, mail@hendrikschutter.com
""" """
from http.server import BaseHTTPRequestHandler, HTTPServer
import paho.mqtt.client as mqtt
from datetime import datetime, timedelta
import threading
import time
import json
import sys import sys
import config import json
import time
import threading
import logging import logging
import ssl import ssl
from datetime import datetime, timedelta
from http.server import BaseHTTPRequestHandler, HTTPServer
from typing import Dict, Optional, Any
import paho.mqtt.client as mqtt
import config
scrape_healthy = True class SensorDataCache:
startTime = datetime.now() """Thread-safe cache for sensor uplink data with timeout tracking"""
lastMqttReception = datetime.now()
node_metrics = list()
mutex = threading.Lock()
request_count = 0
mqtt_client = None def __init__(self, timeout_hours: int = 19):
mqtt_connected = False self._data: Dict[str, Dict[str, Any]] = {}
mqtt_lock = threading.Lock() self._lock = threading.RLock()
self.timeout_hours = timeout_hours
def monitor_timeout(): def update(
global scrape_healthy self, device_id: str, payload: Dict, metadata: list, timestamp: datetime
global lastMqttReception ):
global mqtt_connected """
Update cached data for a device
while True: Args:
time_since_last_reception = datetime.now() - lastMqttReception device_id: Unique device identifier
if time_since_last_reception > timedelta(hours=config.ttn_timeout): payload: Decoded payload from TTN
with mutex: metadata: RX metadata from TTN
scrape_healthy = False timestamp: Timestamp of the uplink
mqtt_connected = False """
time.sleep(60) # Check timeout every minute with self._lock:
self._data[device_id] = {
"payload": payload,
"metadata": metadata,
"timestamp": timestamp,
"is_online": True,
}
logging.info(f"Updated cache for device {device_id}")
def reconnect_mqtt(): def get_all_devices(self) -> Dict[str, Dict[str, Any]]:
global mqtt_client """
global mqtt_connected Get all cached device data
while True: Returns:
if not mqtt_connected: Dictionary of device data
with mqtt_lock: """
try: with self._lock:
if mqtt_client is None: return dict(self._data)
print("MQTT client is None, creating a new client...")
mqtt_client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2)
mqtt_client.on_connect = on_connect
mqtt_client.on_message = on_message
mqtt_client.on_disconnect = on_disconnect
mqtt_client.username_pw_set(config.ttn_user, config.ttn_key)
mqtt_client.tls_set()
print("Attempting to reconnect to MQTT broker...") def check_timeouts(self):
mqtt_client.connect( """Check all devices for timeout and mark offline ones"""
config.ttn_region.lower() + ".cloud.thethings.network", 8883, 60 with self._lock:
now = datetime.now()
timeout_threshold = timedelta(hours=self.timeout_hours)
for device_id, data in self._data.items():
time_since_update = now - data["timestamp"]
was_online = data["is_online"]
data["is_online"] = time_since_update < timeout_threshold
if was_online and not data["is_online"]:
logging.warning(
f"Device {device_id} marked as OFFLINE "
f"(no uplink for {time_since_update.total_seconds()/3600:.1f} hours)"
) )
except Exception as e: elif not was_online and data["is_online"]:
print(f"MQTT reconnect failed: {e}") logging.info(f"Device {device_id} is back ONLINE")
time.sleep(60) # Retry every 10 seconds
def cleanup_old_entries(self, max_age_hours: int = 72):
"""Remove entries older than max_age_hours"""
with self._lock:
now = datetime.now()
max_age = timedelta(hours=max_age_hours)
devices_to_remove = [
device_id
for device_id, data in self._data.items()
if now - data["timestamp"] > max_age
]
for device_id in devices_to_remove:
del self._data[device_id]
logging.info(f"Removed stale cache entry for device {device_id}")
class RequestHandler(BaseHTTPRequestHandler): class TTNMQTTClient:
def log_message(self, format, *args): """Manages MQTT connection to TTN with automatic reconnection"""
pass
def get_metrics(self): def __init__(self, cache: SensorDataCache, config_module):
global request_count self.cache = cache
global node_metrics self.config = config_module
global mutex self.client: Optional[mqtt.Client] = None
mutex.acquire() self.connected = False
self.send_response(200) self._lock = threading.Lock()
self.send_header("Content-type", "text/html") self._should_run = True
self.end_headers()
self.wfile.write(
bytes(
config.exporter_prefix
+ "exporter_duration_seconds_sum "
+ str(int((datetime.now() - startTime).total_seconds()))
+ "\n",
"utf-8",
)
)
self.wfile.write(
bytes(
config.exporter_prefix
+ "exporter_request_count "
+ str(request_count)
+ "\n",
"utf-8",
)
)
self.wfile.write(
bytes(
config.exporter_prefix
+ "exporter_scrape_healthy "
+ str(int(scrape_healthy))
+ "\n",
"utf-8",
)
)
for metric in node_metrics: # Setup logging
self.wfile.write(bytes(config.exporter_prefix + metric + "\n", "utf-8")) self.logger = logging.getLogger("TTNMQTTClient")
mutex.release() def _on_connect(self, client, userdata, flags, reason_code, properties):
"""Callback when connected to MQTT broker"""
if reason_code == 0:
self.logger.info("Successfully connected to TTN MQTT broker")
self.connected = True
def do_GET(self): # Subscribe to uplink messages
global request_count topic = f"v3/{self.config.ttn_user}/devices/+/up"
request_count += 1 client.subscribe(topic, qos=1)
if self.path.startswith("/metrics"): self.logger.info(f"Subscribed to topic: {topic}")
self.get_metrics()
else: else:
self.send_response(200) self.logger.error(
self.send_header("Content-type", "text/html") f"Failed to connect to MQTT broker. Reason code: {reason_code}"
self.end_headers()
self.wfile.write(bytes("<html>", "utf-8"))
self.wfile.write(
bytes("<head><title>VEGAPULS Air exporter</title></head>", "utf-8")
) )
self.wfile.write(bytes("<body>", "utf-8")) self.connected = False
self.wfile.write(
bytes( def _on_disconnect(self, client, userdata, flags, reason_code, properties):
"<h1>ttn-vegapulsair exporter based on data from LoRaWAN TTN node.</h1>", """Callback when disconnected from MQTT broker"""
"utf-8", self.logger.warning(
f"Disconnected from MQTT broker. Reason code: {reason_code}"
)
self.connected = False
def _on_message(self, client, userdata, msg):
"""Callback when a message is received"""
self.logger.debug(f"Uplink message received! {msg.topic}")
try:
# Parse the JSON payload
message_data = json.loads(msg.payload.decode("utf-8"))
# Extract device information
device_id = message_data.get("end_device_ids", {}).get(
"device_id", "unknown"
)
# Check if this is an uplink message with decoded payload
if "uplink_message" not in message_data:
self.logger.debug(f"Ignoring non-uplink message from {device_id}")
return
uplink = message_data["uplink_message"]
if "decoded_payload" not in uplink:
self.logger.warning(f"No decoded payload for device {device_id}")
return
# Update cache with new data
self.cache.update(
device_id=device_id,
payload=uplink["decoded_payload"],
metadata=uplink.get("rx_metadata", []),
timestamp=datetime.now(),
)
self.logger.debug(f"Processed uplink from device: {device_id}")
except json.JSONDecodeError as e:
self.logger.error(f"Failed to parse MQTT message: {e}")
except Exception as e:
self.logger.error(f"Error processing MQTT message: {e}", exc_info=True)
def _create_client(self):
"""Create and configure MQTT client"""
client = mqtt.Client(
client_id=f"vegapuls-exporter-{int(time.time())}",
callback_api_version=mqtt.CallbackAPIVersion.VERSION2,
)
# Set callbacks
client.on_connect = self._on_connect
client.on_disconnect = self._on_disconnect
client.on_message = self._on_message
# Set credentials
client.username_pw_set(self.config.ttn_user, self.config.ttn_key)
# Configure TLS
client.tls_set(cert_reqs=ssl.CERT_REQUIRED, tls_version=ssl.PROTOCOL_TLS_CLIENT)
client.tls_insecure_set(False)
return client
def connect(self):
"""Connect to TTN MQTT broker"""
with self._lock:
try:
if self.client is None:
self.client = self._create_client()
broker_url = f"{self.config.ttn_region.lower()}.cloud.thethings.network"
self.logger.info(f"Connecting to MQTT broker: {broker_url}")
self.client.connect(
broker_url, port=8883, keepalive=self.config.mqtt_keepalive
)
# Start the network loop in a separate thread
self.client.loop_start()
return True
except Exception as e:
self.logger.error(f"Failed to connect to MQTT broker: {e}")
return False
def disconnect(self):
"""Disconnect from MQTT broker"""
with self._lock:
if self.client:
self.client.loop_stop()
self.client.disconnect()
self.connected = False
self.logger.info("Disconnected from MQTT broker")
def run_with_reconnect(self):
"""Main loop with automatic reconnection"""
reconnect_delay = self.config.mqtt_reconnect_delay
while self._should_run:
if not self.connected:
self.logger.info("Attempting to connect to MQTT broker...")
if self.connect():
# Reset reconnect delay on successful connection
reconnect_delay = self.config.mqtt_reconnect_delay
else:
# Exponential backoff for reconnection
self.logger.warning(
f"Reconnection failed. Retrying in {reconnect_delay}s..."
)
time.sleep(reconnect_delay)
reconnect_delay = min(
reconnect_delay * 2, self.config.mqtt_reconnect_max_delay
)
continue
# Wait a bit before checking connection again
time.sleep(10)
def stop(self):
"""Stop the MQTT client"""
self._should_run = False
self.disconnect()
class MetricsServer:
"""HTTP server for Prometheus metrics endpoint"""
def __init__(self, cache: SensorDataCache, config_module):
self.cache = cache
self.config = config_module
self.start_time = datetime.now()
self.request_count = 0
self._lock = threading.Lock()
def _format_metric(
self, name: str, value: Any, labels: Dict[str, str] = None
) -> str:
"""
Format a Prometheus metric
Args:
name: Metric name
value: Metric value
labels: Optional labels dictionary
Returns:
Formatted metric string
"""
metric_name = f"{self.config.exporter_prefix}{name}"
if labels:
label_str = ",".join([f'{k}="{v}"' for k, v in labels.items()])
return f"{metric_name}{{{label_str}}} {value}"
else:
return f"{metric_name} {value}"
def _generate_metrics(self) -> str:
"""Generate all Prometheus metrics"""
metrics = []
# Exporter meta metrics
uptime = int((datetime.now() - self.start_time).total_seconds())
metrics.append(self._format_metric("exporter_uptime_seconds", uptime))
metrics.append(
self._format_metric("exporter_requests_total", self.request_count)
)
# Get all device data
devices = self.cache.get_all_devices()
# Overall health metric
online_devices = sum(1 for d in devices.values() if d["is_online"])
total_devices = len(devices)
metrics.append(self._format_metric("devices_total", total_devices))
metrics.append(self._format_metric("devices_online", online_devices))
# Per-device metrics
for device_id, data in devices.items():
labels = {"device_id": device_id}
# Device online status (1 = online, 0 = offline/timeout)
metrics.append(
self._format_metric("device_online", int(data["is_online"]), labels)
)
# Time since last uplink in seconds
time_since_uplink = (datetime.now() - data["timestamp"]).total_seconds()
metrics.append(
self._format_metric(
"last_uplink_seconds_ago", int(time_since_uplink), labels
) )
) )
self.wfile.write(bytes('<p><a href="/metrics">Metrics</a></p>', "utf-8"))
self.wfile.write(bytes("</body>", "utf-8"))
self.wfile.write(bytes("</html>", "utf-8"))
def update_metrics(payload, metadata): payload = data["payload"]
global node_metrics metadata = data["metadata"]
global mutex
global scrape_healthy
global lastMqttReception
mutex.acquire() # Sensor measurements
node_metrics.clear() if "Distance" in payload:
metrics.append(
self._format_metric(
"distance_mm", float(payload["Distance"]), labels
)
)
if "Distance" in payload: if "Temperature" in payload:
node_metrics.append("distance " + str(float(payload["Distance"]))) metrics.append(
self._format_metric(
"temperature_celsius", int(payload["Temperature"]), labels
)
)
if "Inclination_degree" in payload: if "Inclination_degree" in payload:
node_metrics.append("inclination_degree " + str(int(payload["Inclination_degree"]))) metrics.append(
self._format_metric(
"inclination_degrees",
int(payload["Inclination_degree"]),
labels,
)
)
if "MvLinProcent" in payload: if "MvLinProcent" in payload:
node_metrics.append("linprocent " + str(int(payload["MvLinProcent"]))) metrics.append(
self._format_metric(
"linear_percent", int(payload["MvLinProcent"]), labels
)
)
if "MvProcent" in payload: if "MvProcent" in payload:
node_metrics.append("procent " + str(int(payload["MvProcent"]))) metrics.append(
self._format_metric("percent", int(payload["MvProcent"]), labels)
)
if "MvScaled" in payload: if "MvScaled" in payload:
node_metrics.append("scaled " + str(float(payload["MvScaled"]))) metrics.append(
self._format_metric(
"scaled_value", float(payload["MvScaled"]), labels
)
)
if "MvScaledUnit" in payload: if "MvScaledUnit" in payload:
node_metrics.append("scaled_unit " + str(int(payload["MvScaledUnit"]))) metrics.append(
self._format_metric(
"scaled_unit", int(payload["MvScaledUnit"]), labels
)
)
if "PacketIdentifier" in payload: if "PacketIdentifier" in payload:
node_metrics.append("packet_identifier " + str(int(payload["PacketIdentifier"]))) metrics.append(
self._format_metric(
"packet_identifier", int(payload["PacketIdentifier"]), labels
)
)
if "RemainingPower" in payload: if "RemainingPower" in payload:
node_metrics.append("remaining_power " + str(int(payload["RemainingPower"]))) metrics.append(
self._format_metric(
"battery_percent", int(payload["RemainingPower"]), labels
)
)
if "Temperature" in payload: if "Unit" in payload:
node_metrics.append("temperature " + str(int(payload["Temperature"]))) metrics.append(
self._format_metric("unit", int(payload["Unit"]), labels)
)
if "Unit" in payload: if "UnitTemperature" in payload:
node_metrics.append("unit " + str(int(payload["Unit"]))) metrics.append(
self._format_metric(
"temperature_unit", int(payload["UnitTemperature"]), labels
)
)
if "UnitTemperature" in payload: # LoRaWAN metadata
node_metrics.append("temperature_unit " + str(int(payload["UnitTemperature"]))) if metadata and len(metadata) > 0:
first_gateway = metadata[0]
if "rssi" in metadata[0]: if "rssi" in first_gateway:
node_metrics.append("rssi " + str(int(metadata[0]["rssi"]))) metrics.append(
self._format_metric(
"rssi_dbm", int(first_gateway["rssi"]), labels
)
)
if "channel_rssi" in metadata[0]: if "channel_rssi" in first_gateway:
node_metrics.append("channel_rssi " + str(int(metadata[0]["channel_rssi"]))) metrics.append(
self._format_metric(
"channel_rssi_dbm",
int(first_gateway["channel_rssi"]),
labels,
)
)
if "snr" in metadata[0]: if "snr" in first_gateway:
node_metrics.append("snr " + str(float(metadata[0]["snr"]))) metrics.append(
self._format_metric(
"snr_db", float(first_gateway["snr"]), labels
)
)
scrape_healthy = True return "\n".join(metrics) + "\n"
lastMqttReception = datetime.now()
mutex.release()
def on_connect(client, userdata, flags, reason_code, properties): def create_handler(self):
global mqtt_connected """Create HTTP request handler"""
if reason_code == 0: server_instance = self
print("\nConnected to MQTT: reason_code = " + str(reason_code))
mqtt_connected = True
elif reason_code > 0:
print("\nNot connected to MQTT: reason_code = " + str(reason_code))
mqtt_connected = False
def on_disconnect(client, userdata, flags, reason_code, tmp): class RequestHandler(BaseHTTPRequestHandler):
global mqtt_connected def log_message(self, format, *args):
print(f"Disconnected from MQTT: reason_code = {reason_code}") """Suppress default logging"""
mqtt_connected = False pass
def on_message(mqttc, obj, msg): def do_GET(self):
print("on_message") with server_instance._lock:
global scrape_healthy server_instance.request_count += 1
try: if self.path == "/metrics":
parsedJSON = json.loads(msg.payload) self.send_response(200)
print(parsedJSON) self.send_header("Content-Type", "text/plain; charset=utf-8")
uplink_message = parsedJSON["uplink_message"] self.end_headers()
update_metrics(uplink_message["decoded_payload"], uplink_message["rx_metadata"])
except Exception as e:
with mutex:
scrape_healthy = False
print(f"Unable to parse uplink: {e}")
def poll_mqtt(mqtt_client): metrics = server_instance._generate_metrics()
# Start the network loop self.wfile.write(metrics.encode("utf-8"))
mqtt_client.loop_forever()
def configure_mqtt_client(): elif self.path == "/" or self.path == "/health":
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2) self.send_response(200)
client.on_connect = on_connect self.send_header("Content-Type", "text/html; charset=utf-8")
client.on_message = on_message self.end_headers()
client.on_disconnect = on_disconnect
# Set credentials html = """
client.username_pw_set(config.ttn_user, config.ttn_key) <html>
<head><title>VEGAPULS Air Exporter</title></head>
<body>
<h1>TTN VEGAPULS Air Prometheus Exporter</h1>
<p>Exporter for VEGAPULS Air sensors connected via The Things Network</p>
<p><a href="/metrics">Metrics</a></p>
</body>
</html>
"""
self.wfile.write(html.encode("utf-8"))
# Set up TLS/SSL else:
client.tls_set( self.send_response(404)
cert_reqs=ssl.CERT_REQUIRED, self.end_headers()
tls_version=ssl.PROTOCOL_TLSv1_2, # Enforce TLS 1.2
return RequestHandler
class TimeoutMonitor:
"""Background thread to monitor device timeouts"""
def __init__(self, cache: SensorDataCache, config_module):
self.cache = cache
self.config = config_module
self._should_run = True
self.logger = logging.getLogger("TimeoutMonitor")
def run(self):
"""Main monitoring loop"""
while self._should_run:
try:
self.cache.check_timeouts()
# Also cleanup old entries periodically
if hasattr(self.config, "cache_cleanup_interval"):
self.cache.cleanup_old_entries(self.config.max_cache_age_hours)
except Exception as e:
self.logger.error(f"Error in timeout monitoring: {e}", exc_info=True)
# Check every minute
time.sleep(60)
def stop(self):
"""Stop the monitor"""
self._should_run = False
def setup_logging(config_module):
"""Configure logging"""
log_level = getattr(logging, config_module.log_level.upper(), logging.INFO)
log_format = getattr(
config_module,
"log_format",
"%(asctime)s - %(name)s - %(levelname)s - %(message)s",
)
logging.basicConfig(
level=log_level, format=log_format, handlers=[logging.StreamHandler(sys.stdout)]
) )
client.tls_insecure_set(False) # Enforce strict certificate validation
return client
def main(): def main():
global mqtt_client """Main application entry point"""
# Setup logging
setup_logging(config)
logger = logging.getLogger("Main")
# Start timeout monitoring thread logger.info("=" * 60)
timeout_thread = threading.Thread(target=monitor_timeout, daemon=True) logger.info("TTN VEGAPULS Air Prometheus Exporter")
timeout_thread.start() logger.info("=" * 60)
logger.info(f"Integration Method: {config.integration_method}")
logger.info(f"Sensor Timeout: {config.sensor_timeout_hours} hours")
logger.info(f"HTTP Server: {config.hostName}:{config.serverPort}")
logger.info("=" * 60)
# Start MQTT reconnect thread # Create sensor data cache
reconnect_thread = threading.Thread(target=reconnect_mqtt, daemon=True) cache = SensorDataCache(timeout_hours=config.sensor_timeout_hours)
reconnect_thread.start()
while True: # Start timeout monitor
mqtt_client = configure_mqtt_client() timeout_monitor = TimeoutMonitor(cache, config)
try: monitor_thread = threading.Thread(
# Connect to TTN broker target=timeout_monitor.run, daemon=True, name="TimeoutMonitor"
broker_url = f"{config.ttn_region.lower()}.cloud.thethings.network" )
mqtt_client.connect(broker_url, 8883, 60) monitor_thread.start()
logger.info("Started timeout monitor")
# Subscribe to all topics
mqtt_client.subscribe("#", 1)
logging.info(f"Subscribed to all topics.")
poll_mqtt_thread = threading.Thread(target=poll_mqtt, args=((mqtt_client,))) # Start MQTT client if configured
poll_mqtt_thread.start() mqtt_client = None
except Exception as e: mqtt_thread = None
logging.error(f"Error occurred: {e}") if config.integration_method.lower() == "mqtt":
mqtt_client.loop_stop() mqtt_client = TTNMQTTClient(cache, config)
mqtt_thread = threading.Thread(
target=mqtt_client.run_with_reconnect, daemon=True, name="MQTTClient"
)
mqtt_thread.start()
logger.info("Started MQTT client")
else:
logger.warning(f"Unsupported integration method: {config.integration_method}")
logger.warning("Only 'mqtt' is currently supported")
webServer = HTTPServer((config.hostName, config.serverPort), RequestHandler) # Start HTTP server
print("Server started http://%s:%s" % (config.hostName, config.serverPort)) metrics_server = MetricsServer(cache, config)
handler = metrics_server.create_handler()
try: try:
webServer.serve_forever() http_server = HTTPServer((config.hostName, config.serverPort), handler)
except KeyboardInterrupt: logger.info(
sys.exit(-1) f"HTTP server started at http://{config.hostName}:{config.serverPort}"
)
logger.info("Press Ctrl+C to stop")
http_server.serve_forever()
except KeyboardInterrupt:
logger.info("\nShutdown requested by user")
except Exception as e:
logger.error(f"Fatal error: {e}", exc_info=True)
finally:
# Cleanup
logger.info("Shutting down...")
if mqtt_client:
mqtt_client.stop()
timeout_monitor.stop()
logger.info("Shutdown complete")
sys.exit(0)
webServer.server_close()
print("Server stopped.")
poll_mqtt_thread.join()
except Exception as e:
print(e)
time.sleep(60)
if __name__ == "__main__": if __name__ == "__main__":
main() main()

View File

@ -1,16 +1,45 @@
[Unit] [Unit]
Description=TTN Exporter for VEGAPULS Air Description=TTN VEGAPULS Air Prometheus Exporter
After=syslog.target Documentation=https://git.mosad.xyz/localhorst/TTN-VEGAPULS-Air-exporter
After=network.target After=network-online.target
Wants=network-online.target
[Service] [Service]
Restart=on-failure
RestartSec=2s
Type=simple Type=simple
User=prometheus User=prometheus
Group=prometheus Group=prometheus
# Working directory
WorkingDirectory=/opt/ttn-vegapulsair-exporter/ WorkingDirectory=/opt/ttn-vegapulsair-exporter/
# Execution
ExecStart=/usr/bin/python3 /opt/ttn-vegapulsair-exporter/ttn-vegapulsair-exporter.py ExecStart=/usr/bin/python3 /opt/ttn-vegapulsair-exporter/ttn-vegapulsair-exporter.py
# Restart configuration
Restart=always
RestartSec=10
# Logging
StandardOutput=journal
StandardError=journal
SyslogIdentifier=ttn-vegapuls-exporter
# Security settings
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/ttn-vegapulsair-exporter/
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
# Resource limits
MemoryLimit=256M
CPUQuota=5%
# Environment
Environment="PYTHONUNBUFFERED=1"
[Install] [Install]
WantedBy=multi-user.target WantedBy=multi-user.target