# Kleinanzeigen Boosted A web-based map visualization tool for searching and exploring listings from kleinanzeigen.de with real-time geographic display on OpenStreetMap. ## Features - πŸ—ΊοΈ Interactive map visualization - πŸ” Advanced search with price range (more options in future) - πŸ“ Automatic geocoding of listings via Nominatim API - ⚑ Parallel scraping with concurrent workers - πŸ“Š Prometheus-compatible metrics endpoint - 🎯 Real-time progress tracking with ETA - πŸ’Ύ ZIP code caching to minimize API calls - 🌐 User location display on map ## Architecture **Backend**: Flask API server with multi-threaded scraping **Frontend**: Vanilla JavaScript with Leaflet.js for maps **Data Sources**: kleinanzeigen.de, OpenStreetMap/Nominatim ## Requirements ### Python Packages ```bash pip install flask flask-cors beautifulsoup4 lxml urllib3 requests ``` ### System Requirements - Python 3.8+ - nginx (for production deployment) ## Installation ### 1. Create System User ```bash mkdir -p /home/kleinanzeigenscraper/ useradd --system -K MAIL_DIR=/dev/null kleinanzeigenscraper -d /home/kleinanzeigenscraper chown -R kleinanzeigenscraper:kleinanzeigenscraper /home/kleinanzeigenscraper ``` ### 2. Clone Repository ```bash cd /home/kleinanzeigenscraper/ mkdir git cd git git clone https://git.mosad.xyz/localhorst/kleinanzeigen-boosted.git cd kleinanzeigen-boosted git checkout main ``` ### 3. Install Dependencies ```bash pip install flask flask-cors beautifulsoup4 lxml urllib3 requests ``` or via zypper ```bash zypper install python313-Flask python313-Flask-Cors python313-beautifulsoup4 python313-lxml python313-urllib3 python313-requests ``` ### 4. Configure Application Create `config.json`: ```json { "server": { "host": "127.0.0.1", "port": 5000, "debug": false }, "scraping": { "session_timeout": 300, "listings_per_page": 25, "max_workers": 5, "min_workers": 2, "rate_limit_delay": 0.5, "geocoding_delay": 1.0 }, "cache": { "zip_cache_file": "zip_cache.json" }, "apis": { "nominatim": { "url": "https://nominatim.openstreetmap.org/search", "user_agent": "kleinanzeigen-scraper" }, "kleinanzeigen": { "base_url": "https://www.kleinanzeigen.de" } }, "user_agents": [ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" ] } ``` ### 5. Create Systemd Service Create `/lib/systemd/system/kleinanzeigenscraper.service`: ```ini [Unit] Description=Kleinanzeigen Scraper API After=network.target systemd-networkd-wait-online.service [Service] Type=simple User=kleinanzeigenscraper WorkingDirectory=/home/kleinanzeigenscraper/git/kleinanzeigen-boosted/backend/ ExecStart=/usr/bin/python3 scrape_proxy.py Restart=on-failure RestartSec=10 StandardOutput=append:/var/log/kleinanzeigenscraper.log StandardError=append:/var/log/kleinanzeigenscraper.log [Install] WantedBy=multi-user.target ``` ### 6. Enable and Start Service ```bash systemctl daemon-reload systemctl enable kleinanzeigenscraper.service systemctl start kleinanzeigenscraper.service systemctl status kleinanzeigenscraper.service ``` ### 7. Configure nginx Reverse Proxy Create `/etc/nginx/sites-available/kleinanzeigenscraper`: ```nginx server { listen 80; server_name your-domain.com; # Redirect HTTP to HTTPS return 301 https://$server_name$request_uri; } server { listen 443 ssl; http2 on; server_name your-domain.com; ssl_certificate /path/to/ssl/cert.pem; ssl_certificate_key /path/to/ssl/key.pem; # Security headers add_header X-Frame-Options "SAMEORIGIN" always; add_header X-Content-Type-Options "nosniff" always; add_header X-XSS-Protection "1; mode=block" always; location / { client_max_body_size 1G; proxy_buffering off; #Path to the root of your installation root /home/kleinanzeigenscraper/git/kleinanzeigen-boosted/web/; index index.html; } location /api/ { proxy_pass http://127.0.0.1:27979; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_read_timeout 300; } } ``` Enable site: ```bash ln -s /etc/nginx/sites-available/kleinanzeigenscraper /etc/nginx/sites-enabled/ nginx -t systemctl reload nginx ``` ## API Endpoints ### `POST /api/search` Start a new search session. **Request Body:** ```json { "search_term": "Fahrrad", "num_listings": 25, "min_price": 0, "max_price": 1000 } ``` **Response:** ```json { "session_id": "uuid-string", "total": 25 } ``` ### `GET /api/scrape/` Get the next scraped listing from an active session. **Response:** ```json { "complete": false, "listing": { "title": "Mountain Bike", "price": 450, "id": 123456, "zip_code": "76593", "address": "Gernsbach", "date_added": "2025-11-20", "image": "https://...", "url": "https://...", "lat": 48.7634, "lon": 8.3344 }, "progress": { "current": 5, "total": 25 } } ``` ### `POST /api/scrape//cancel` Cancel an active scraping session and delete cached listings. **Response:** ```json { "cancelled": true, "message": "Session deleted" } ``` ### `GET /api/health` Health check endpoint. **Response:** ```json { "status": "ok" } ``` ### `GET /api/metrics` Prometheus-compatible metrics endpoint. **Response** (text/plain): ``` # HELP search_requests_total Total number of search requests # TYPE search_requests_total counter search_requests_total 42 # HELP scrape_requests_total Total number of scrape requests # TYPE scrape_requests_total counter scrape_requests_total 1050 # HELP uptime_seconds Application uptime in seconds # TYPE uptime_seconds gauge uptime_seconds 86400 # HELP active_sessions Number of active scraping sessions # TYPE active_sessions gauge active_sessions 2 # HELP cache_size Number of cached ZIP codes # TYPE cache_size gauge zip_code_cache_size 150 # HELP kleinanzeigen_http_responses_total HTTP responses from kleinanzeigen.de # TYPE kleinanzeigen_http_responses_total counter kleinanzeigen_http_responses_total{code="200"} 1000 kleinanzeigen_http_responses_total{code="error"} 5 # HELP nominatim_http_responses_total HTTP responses from Nominatim API # TYPE nominatim_http_responses_total counter nominatim_http_responses_total{code="200"} 150 ``` ## Configuration Options ### Server Configuration - `host`: Bind address (default: 0.0.0.0) - `port`: Port number (default: 5000) - `debug`: Debug mode (default: false) ### Scraping Configuration - `session_timeout`: Session expiry in seconds (default: 300) - `listings_per_page`: Listings per page on kleinanzeigen.de (default: 25) - `max_workers`: Number of parallel scraping threads (default: 4) - `min_workers`: Number of parallel scraping threads (default: 2) - `rate_limit_delay`: Delay between batches in seconds (default: 0.5) - `geocoding_delay`: Delay between geocoding requests (default: 1.0) ### Cache Configuration - `zip_cache_file`: Path to ZIP code cache file (default: zip_cache.json) ## Monitoring View logs: ```bash tail -f /var/log/kleinanzeigenscraper.log ``` Check service status: ```bash systemctl status kleinanzeigenscraper.service ``` Monitor metrics (Prometheus): ```bash curl http://localhost:5000/api/metrics ``` ## Development Run in debug mode: ```bash python3 scrape_proxy.py ``` Frontend files are located in `web/`: - `index.html` - Main HTML file - `css/style.css` - Stylesheet - `js/config.js` - Configuration - `js/map.js` - Map functions - `js/ui.js` - UI functions - `js/api.js` - API communication - `js/app.js` - Main application ## License This project is provided as-is for educational purposes. Respect kleinanzeigen.de's terms of service and robots.txt when using this tool. ## Credits Built with: - Flask (Python web framework) - Leaflet.js (Interactive maps) - BeautifulSoup4 (HTML parsing) - OpenStreetMap & Nominatim (Geocoding)