309 lines
7.1 KiB
Markdown
309 lines
7.1 KiB
Markdown
# Kleinanzeigen Boosted
|
|
|
|
A web-based map visualization tool for searching and exploring listings from kleinanzeigen.de with real-time geographic display on OpenStreetMap.
|
|
|
|
## Features
|
|
|
|
- 🗺️ Interactive map visualization
|
|
- 🔍 Advanced search with price range (more options in future)
|
|
- 📍 Automatic geocoding of listings via Nominatim API
|
|
- ⚡ Parallel scraping with concurrent workers
|
|
- 📊 Prometheus-compatible metrics endpoint
|
|
- 🎯 Real-time progress tracking with ETA
|
|
- 💾 ZIP code caching to minimize API calls
|
|
- 🌐 User location display on map
|
|
|
|
## Architecture
|
|
|
|
- **Backend**: Flask API server with multi-threaded scraping
|
|
- **Frontend**: Vanilla JavaScript with Leaflet.js for maps
|
|
- **Data Sources**: kleinanzeigen.de, OpenStreetMap/Nominatim
|
|
|
|
## Installation
|
|
|
|
### 1. Create System User
|
|
|
|
```bash
|
|
mkdir -p /home/kleinanzeigenscraper/
|
|
useradd --system -K MAIL_DIR=/dev/null kleinanzeigenscraper -d /home/kleinanzeigenscraper
|
|
chown -R kleinanzeigenscraper:kleinanzeigenscraper /home/kleinanzeigenscraper
|
|
```
|
|
|
|
### 2. Clone Repository
|
|
|
|
```bash
|
|
cd /home/kleinanzeigenscraper/
|
|
mkdir git
|
|
cd git
|
|
git clone https://git.mosad.xyz/localhorst/kleinanzeigen-boosted.git
|
|
cd kleinanzeigen-boosted
|
|
git checkout main
|
|
```
|
|
|
|
### 3. Install Dependencies
|
|
|
|
```bash
|
|
pip install flask flask-cors beautifulsoup4 lxml urllib3 requests
|
|
```
|
|
or via zypper
|
|
```bash
|
|
zypper install python313-Flask python313-Flask-Cors python313-beautifulsoup4 python313-lxml python313-urllib3 python313-requests
|
|
```
|
|
|
|
### 4. Configure Application
|
|
|
|
Create/modify [config.json](backend/config.json).
|
|
|
|
### 5. Create Systemd Service
|
|
|
|
Create `/lib/systemd/system/kleinanzeigenscraper.service`:
|
|
|
|
```ini
|
|
[Unit]
|
|
Description=Kleinanzeigen Scraper API
|
|
After=network.target systemd-networkd-wait-online.service
|
|
|
|
[Service]
|
|
Type=simple
|
|
User=kleinanzeigenscraper
|
|
WorkingDirectory=/home/kleinanzeigenscraper/git/kleinanzeigen-boosted/backend/
|
|
ExecStart=/usr/bin/python3 scrape_proxy.py
|
|
Restart=on-failure
|
|
RestartSec=10
|
|
StandardOutput=append:/var/log/kleinanzeigenscraper.log
|
|
StandardError=append:/var/log/kleinanzeigenscraper.log
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
```
|
|
|
|
### 6. Enable and Start Service
|
|
|
|
```bash
|
|
systemctl daemon-reload
|
|
systemctl enable kleinanzeigenscraper.service
|
|
systemctl start kleinanzeigenscraper.service
|
|
systemctl status kleinanzeigenscraper.service
|
|
```
|
|
|
|
### 7. Configure nginx Reverse Proxy
|
|
|
|
Create `/etc/nginx/sites-available/kleinanzeigenscraper`:
|
|
|
|
```nginx
|
|
server {
|
|
listen 80;
|
|
server_name your-domain.com;
|
|
|
|
# Redirect HTTP to HTTPS
|
|
return 301 https://$server_name$request_uri;
|
|
}
|
|
|
|
server {
|
|
listen 443 ssl;
|
|
http2 on;
|
|
server_name your-domain.com;
|
|
|
|
ssl_certificate /path/to/ssl/cert.pem;
|
|
ssl_certificate_key /path/to/ssl/key.pem;
|
|
|
|
# Security headers
|
|
add_header X-Frame-Options "SAMEORIGIN" always;
|
|
add_header X-Content-Type-Options "nosniff" always;
|
|
add_header X-XSS-Protection "1; mode=block" always;
|
|
|
|
location / {
|
|
client_max_body_size 1G;
|
|
proxy_buffering off;
|
|
|
|
#Path to the root of your installation
|
|
root /home/kleinanzeigenscraper/git/kleinanzeigen-boosted/web/;
|
|
index index.html;
|
|
}
|
|
|
|
location /api/ {
|
|
proxy_pass http://127.0.0.1:27979;
|
|
proxy_set_header Host $host;
|
|
proxy_set_header X-Real-IP $remote_addr;
|
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
|
proxy_set_header X-Forwarded-Proto $scheme;
|
|
proxy_read_timeout 300;
|
|
}
|
|
}
|
|
```
|
|
|
|
Enable site:
|
|
|
|
```bash
|
|
ln -s /etc/nginx/sites-available/kleinanzeigenscraper /etc/nginx/sites-enabled/
|
|
nginx -t
|
|
systemctl reload nginx
|
|
```
|
|
|
|
## API Endpoints
|
|
|
|
### `POST /api/search`
|
|
Start a new search session.
|
|
|
|
**Request Body:**
|
|
```json
|
|
{
|
|
"search_term": "Fahrrad",
|
|
"num_listings": 25,
|
|
"min_price": 0,
|
|
"max_price": 1000
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"session_id": "uuid-string",
|
|
"total": 25
|
|
}
|
|
```
|
|
|
|
### `GET /api/scrape/<session_id>`
|
|
Get the next scraped listing from an active session.
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"complete": false,
|
|
"listing": {
|
|
"title": "Mountain Bike",
|
|
"price": 450,
|
|
"id": 123456,
|
|
"zip_code": "76593",
|
|
"address": "Gernsbach",
|
|
"date_added": "2025-11-20",
|
|
"image": "https://...",
|
|
"url": "https://...",
|
|
"lat": 48.7634,
|
|
"lon": 8.3344
|
|
},
|
|
"progress": {
|
|
"current": 5,
|
|
"total": 25
|
|
}
|
|
}
|
|
```
|
|
|
|
### `POST /api/scrape/<session_id>/cancel`
|
|
Cancel an active scraping session and delete cached listings.
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"cancelled": true,
|
|
"message": "Session deleted"
|
|
}
|
|
```
|
|
|
|
### `GET /api/health`
|
|
Health check endpoint.
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"status": "ok"
|
|
}
|
|
```
|
|
|
|
### `GET /api/metrics`
|
|
Prometheus-compatible metrics endpoint.
|
|
|
|
**Response** (text/plain):
|
|
```
|
|
# HELP search_requests_total Total number of search requests
|
|
# TYPE search_requests_total counter
|
|
search_requests_total 42
|
|
|
|
# HELP scrape_requests_total Total number of scrape requests
|
|
# TYPE scrape_requests_total counter
|
|
scrape_requests_total 1050
|
|
|
|
# HELP uptime_seconds Application uptime in seconds
|
|
# TYPE uptime_seconds gauge
|
|
uptime_seconds 86400
|
|
|
|
# HELP active_sessions Number of active scraping sessions
|
|
# TYPE active_sessions gauge
|
|
active_sessions 2
|
|
|
|
# HELP cache_size Number of cached ZIP codes
|
|
# TYPE cache_size gauge
|
|
zip_code_cache_size 150
|
|
|
|
# HELP kleinanzeigen_http_responses_total HTTP responses from kleinanzeigen.de
|
|
# TYPE kleinanzeigen_http_responses_total counter
|
|
kleinanzeigen_http_responses_total{code="200"} 1000
|
|
kleinanzeigen_http_responses_total{code="error"} 5
|
|
|
|
# HELP nominatim_http_responses_total HTTP responses from Nominatim API
|
|
# TYPE nominatim_http_responses_total counter
|
|
nominatim_http_responses_total{code="200"} 150
|
|
```
|
|
|
|
## Configuration Options
|
|
|
|
### Server Configuration
|
|
- `host`: Bind address (default: 0.0.0.0)
|
|
- `port`: Port number (default: 5000)
|
|
- `debug`: Debug mode (default: false)
|
|
|
|
### Scraping Configuration
|
|
- `session_timeout`: Session expiry in seconds (default: 300)
|
|
- `listings_per_page`: Listings per page on kleinanzeigen.de (default: 25)
|
|
- `max_workers`: Number of parallel scraping threads (default: 4)
|
|
- `min_workers`: Number of parallel scraping threads (default: 2)
|
|
- `rate_limit_delay`: Delay between batches in seconds (default: 0.5)
|
|
- `geocoding_delay`: Delay between geocoding requests (default: 1.0)
|
|
|
|
### Cache Configuration
|
|
- `zip_cache_file`: Path to ZIP code cache file (default: zip_cache.json)
|
|
|
|
## Monitoring
|
|
|
|
View logs:
|
|
```bash
|
|
tail -f /var/log/kleinanzeigenscraper.log
|
|
```
|
|
|
|
Check service status:
|
|
```bash
|
|
systemctl status kleinanzeigenscraper.service
|
|
```
|
|
|
|
Monitor metrics (Prometheus):
|
|
```bash
|
|
curl http://localhost:5000/api/metrics
|
|
```
|
|
|
|
## Development
|
|
|
|
Run in debug mode:
|
|
```bash
|
|
python3 scrape_proxy.py
|
|
```
|
|
|
|
Frontend files are located in `web/`:
|
|
- `index.html` - Main HTML file
|
|
- `css/style.css` - Stylesheet
|
|
- `js/config.js` - Configuration
|
|
- `js/map.js` - Map functions
|
|
- `js/ui.js` - UI functions
|
|
- `js/api.js` - API communication
|
|
- `js/app.js` - Main application
|
|
|
|
## License
|
|
|
|
This project is provided as-is for educational purposes. Respect kleinanzeigen.de's terms of service and robots.txt when using this tool.
|
|
|
|
## Credits
|
|
|
|
Built with:
|
|
- Flask (Python web framework)
|
|
- Leaflet.js (Interactive maps)
|
|
- BeautifulSoup4 (HTML parsing)
|
|
- OpenStreetMap & Nominatim (Geocoding) |