Files
kleinanzeigen-boosted/README.md
2025-11-25 23:04:41 +01:00

309 lines
7.1 KiB
Markdown

# Kleinanzeigen Boosted
A web-based map visualization tool for searching and exploring listings from kleinanzeigen.de with real-time geographic display on OpenStreetMap.
## Features
- 🗺️ Interactive map visualization
- 🔍 Advanced search with price range (more options in future)
- 📍 Automatic geocoding of listings via Nominatim API
- ⚡ Parallel scraping with concurrent workers
- 📊 Prometheus-compatible metrics endpoint
- 🎯 Real-time progress tracking with ETA
- 💾 ZIP code caching to minimize API calls
- 🌐 User location display on map
## Architecture
- **Backend**: Flask API server with multi-threaded scraping
- **Frontend**: Vanilla JavaScript with Leaflet.js for maps
- **Data Sources**: kleinanzeigen.de, OpenStreetMap/Nominatim
## Installation
### 1. Create System User
```bash
mkdir -p /home/kleinanzeigenscraper/
useradd --system -K MAIL_DIR=/dev/null kleinanzeigenscraper -d /home/kleinanzeigenscraper
chown -R kleinanzeigenscraper:kleinanzeigenscraper /home/kleinanzeigenscraper
```
### 2. Clone Repository
```bash
cd /home/kleinanzeigenscraper/
mkdir git
cd git
git clone https://git.mosad.xyz/localhorst/kleinanzeigen-boosted.git
cd kleinanzeigen-boosted
git checkout main
```
### 3. Install Dependencies
```bash
pip install flask flask-cors beautifulsoup4 lxml urllib3 requests
```
or via zypper
```bash
zypper install python313-Flask python313-Flask-Cors python313-beautifulsoup4 python313-lxml python313-urllib3 python313-requests
```
### 4. Configure Application
Create/modify [config.json](backend/config.json).
### 5. Create Systemd Service
Create `/lib/systemd/system/kleinanzeigenscraper.service`:
```ini
[Unit]
Description=Kleinanzeigen Scraper API
After=network.target systemd-networkd-wait-online.service
[Service]
Type=simple
User=kleinanzeigenscraper
WorkingDirectory=/home/kleinanzeigenscraper/git/kleinanzeigen-boosted/backend/
ExecStart=/usr/bin/python3 scrape_proxy.py
Restart=on-failure
RestartSec=10
StandardOutput=append:/var/log/kleinanzeigenscraper.log
StandardError=append:/var/log/kleinanzeigenscraper.log
[Install]
WantedBy=multi-user.target
```
### 6. Enable and Start Service
```bash
systemctl daemon-reload
systemctl enable kleinanzeigenscraper.service
systemctl start kleinanzeigenscraper.service
systemctl status kleinanzeigenscraper.service
```
### 7. Configure nginx Reverse Proxy
Create `/etc/nginx/sites-available/kleinanzeigenscraper`:
```nginx
server {
listen 80;
server_name your-domain.com;
# Redirect HTTP to HTTPS
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl;
http2 on;
server_name your-domain.com;
ssl_certificate /path/to/ssl/cert.pem;
ssl_certificate_key /path/to/ssl/key.pem;
# Security headers
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
location / {
client_max_body_size 1G;
proxy_buffering off;
#Path to the root of your installation
root /home/kleinanzeigenscraper/git/kleinanzeigen-boosted/web/;
index index.html;
}
location /api/ {
proxy_pass http://127.0.0.1:27979;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_read_timeout 300;
}
}
```
Enable site:
```bash
ln -s /etc/nginx/sites-available/kleinanzeigenscraper /etc/nginx/sites-enabled/
nginx -t
systemctl reload nginx
```
## API Endpoints
### `POST /api/search`
Start a new search session.
**Request Body:**
```json
{
"search_term": "Fahrrad",
"num_listings": 25,
"min_price": 0,
"max_price": 1000
}
```
**Response:**
```json
{
"session_id": "uuid-string",
"total": 25
}
```
### `GET /api/scrape/<session_id>`
Get the next scraped listing from an active session.
**Response:**
```json
{
"complete": false,
"listing": {
"title": "Mountain Bike",
"price": 450,
"id": 123456,
"zip_code": "76593",
"address": "Gernsbach",
"date_added": "2025-11-20",
"image": "https://...",
"url": "https://...",
"lat": 48.7634,
"lon": 8.3344
},
"progress": {
"current": 5,
"total": 25
}
}
```
### `POST /api/scrape/<session_id>/cancel`
Cancel an active scraping session and delete cached listings.
**Response:**
```json
{
"cancelled": true,
"message": "Session deleted"
}
```
### `GET /api/health`
Health check endpoint.
**Response:**
```json
{
"status": "ok"
}
```
### `GET /api/metrics`
Prometheus-compatible metrics endpoint.
**Response** (text/plain):
```
# HELP search_requests_total Total number of search requests
# TYPE search_requests_total counter
search_requests_total 42
# HELP scrape_requests_total Total number of scrape requests
# TYPE scrape_requests_total counter
scrape_requests_total 1050
# HELP uptime_seconds Application uptime in seconds
# TYPE uptime_seconds gauge
uptime_seconds 86400
# HELP active_sessions Number of active scraping sessions
# TYPE active_sessions gauge
active_sessions 2
# HELP cache_size Number of cached ZIP codes
# TYPE cache_size gauge
zip_code_cache_size 150
# HELP kleinanzeigen_http_responses_total HTTP responses from kleinanzeigen.de
# TYPE kleinanzeigen_http_responses_total counter
kleinanzeigen_http_responses_total{code="200"} 1000
kleinanzeigen_http_responses_total{code="error"} 5
# HELP nominatim_http_responses_total HTTP responses from Nominatim API
# TYPE nominatim_http_responses_total counter
nominatim_http_responses_total{code="200"} 150
```
## Configuration Options
### Server Configuration
- `host`: Bind address (default: 0.0.0.0)
- `port`: Port number (default: 5000)
- `debug`: Debug mode (default: false)
### Scraping Configuration
- `session_timeout`: Session expiry in seconds (default: 300)
- `listings_per_page`: Listings per page on kleinanzeigen.de (default: 25)
- `max_workers`: Number of parallel scraping threads (default: 4)
- `min_workers`: Number of parallel scraping threads (default: 2)
- `rate_limit_delay`: Delay between batches in seconds (default: 0.5)
- `geocoding_delay`: Delay between geocoding requests (default: 1.0)
### Cache Configuration
- `zip_cache_file`: Path to ZIP code cache file (default: zip_cache.json)
## Monitoring
View logs:
```bash
tail -f /var/log/kleinanzeigenscraper.log
```
Check service status:
```bash
systemctl status kleinanzeigenscraper.service
```
Monitor metrics (Prometheus):
```bash
curl http://localhost:5000/api/metrics
```
## Development
Run in debug mode:
```bash
python3 scrape_proxy.py
```
Frontend files are located in `web/`:
- `index.html` - Main HTML file
- `css/style.css` - Stylesheet
- `js/config.js` - Configuration
- `js/map.js` - Map functions
- `js/ui.js` - UI functions
- `js/api.js` - API communication
- `js/app.js` - Main application
## License
This project is provided as-is for educational purposes. Respect kleinanzeigen.de's terms of service and robots.txt when using this tool.
## Credits
Built with:
- Flask (Python web framework)
- Leaflet.js (Interactive maps)
- BeautifulSoup4 (HTML parsing)
- OpenStreetMap & Nominatim (Geocoding)