Files
kleinanzeigen-boosted/README.md
2025-11-25 22:54:02 +01:00

8.2 KiB

Kleinanzeigen Boosted

A web-based map visualization tool for searching and exploring listings from kleinanzeigen.de with real-time geographic display on OpenStreetMap.

Features

  • 🗺️ Interactive map visualization
  • 🔍 Advanced search with price range (more options in future)
  • 📍 Automatic geocoding of listings via Nominatim API
  • Parallel scraping with concurrent workers
  • 📊 Prometheus-compatible metrics endpoint
  • 🎯 Real-time progress tracking with ETA
  • 💾 ZIP code caching to minimize API calls
  • 🌐 User location display on map

Architecture

Backend: Flask API server with multi-threaded scraping Frontend: Vanilla JavaScript with Leaflet.js for maps Data Sources: kleinanzeigen.de, OpenStreetMap/Nominatim

Requirements

Python Packages

pip install flask flask-cors beautifulsoup4 lxml urllib3 requests

System Requirements

  • Python 3.8+
  • nginx (for production deployment)

Installation

1. Create System User

mkdir -p /home/kleinanzeigenscraper/
useradd --system -K MAIL_DIR=/dev/null kleinanzeigenscraper -d /home/kleinanzeigenscraper
chown -R kleinanzeigenscraper:kleinanzeigenscraper /home/kleinanzeigenscraper

2. Clone Repository

cd /home/kleinanzeigenscraper/
mkdir git
cd git
git clone https://git.mosad.xyz/localhorst/kleinanzeigen-boosted.git
cd kleinanzeigen-boosted
git checkout main

3. Install Dependencies

pip install flask flask-cors beautifulsoup4 lxml urllib3 requests

or via zypper

zypper install python313-Flask python313-Flask-Cors python313-beautifulsoup4 python313-lxml python313-urllib3 python313-requests

4. Configure Application

Create config.json:

{
  "server": {
    "host": "127.0.0.1",
    "port": 5000,
    "debug": false
  },
  "scraping": {
    "session_timeout": 300,
    "listings_per_page": 25,
    "max_workers": 5,
    "min_workers": 2,
    "rate_limit_delay": 0.5,
    "geocoding_delay": 1.0
  },
  "cache": {
    "zip_cache_file": "zip_cache.json"
  },
  "apis": {
    "nominatim": {
      "url": "https://nominatim.openstreetmap.org/search",
      "user_agent": "kleinanzeigen-scraper"
    },
    "kleinanzeigen": {
      "base_url": "https://www.kleinanzeigen.de"
    }
  },
  "user_agents": [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
  ]
}

5. Create Systemd Service

Create /lib/systemd/system/kleinanzeigenscraper.service:

[Unit]
Description=Kleinanzeigen Scraper API
After=network.target systemd-networkd-wait-online.service

[Service]
Type=simple
User=kleinanzeigenscraper
WorkingDirectory=/home/kleinanzeigenscraper/git/kleinanzeigen-boosted/backend/
ExecStart=/usr/bin/python3 scrape_proxy.py
Restart=on-failure
RestartSec=10
StandardOutput=append:/var/log/kleinanzeigenscraper.log
StandardError=append:/var/log/kleinanzeigenscraper.log

[Install]
WantedBy=multi-user.target

6. Enable and Start Service

systemctl daemon-reload
systemctl enable kleinanzeigenscraper.service
systemctl start kleinanzeigenscraper.service
systemctl status kleinanzeigenscraper.service

7. Configure nginx Reverse Proxy

Create /etc/nginx/sites-available/kleinanzeigenscraper:

server {
    listen 80;
    server_name your-domain.com;

    # Redirect HTTP to HTTPS
    return 301 https://$server_name$request_uri;
}

server {
    listen 443 ssl;
    http2 on;
    server_name your-domain.com;

    ssl_certificate /path/to/ssl/cert.pem;
    ssl_certificate_key /path/to/ssl/key.pem;

    # Security headers
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-XSS-Protection "1; mode=block" always;

  	location / {
        client_max_body_size 1G;
        proxy_buffering off;

        #Path to the root of your installation
        root /home/kleinanzeigenscraper/git/kleinanzeigen-boosted/web/;
        index index.html;
	}
  
  
  location /api/ {
        proxy_pass        http://127.0.0.1:27979;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_read_timeout 300;
    }
}

Enable site:

ln -s /etc/nginx/sites-available/kleinanzeigenscraper /etc/nginx/sites-enabled/
nginx -t
systemctl reload nginx

API Endpoints

POST /api/search

Start a new search session.

Request Body:

{
  "search_term": "Fahrrad",
  "num_listings": 25,
  "min_price": 0,
  "max_price": 1000
}

Response:

{
  "session_id": "uuid-string",
  "total": 25
}

GET /api/scrape/<session_id>

Get the next scraped listing from an active session.

Response:

{
  "complete": false,
  "listing": {
    "title": "Mountain Bike",
    "price": 450,
    "id": 123456,
    "zip_code": "76593",
    "address": "Gernsbach",
    "date_added": "2025-11-20",
    "image": "https://...",
    "url": "https://...",
    "lat": 48.7634,
    "lon": 8.3344
  },
  "progress": {
    "current": 5,
    "total": 25
  }
}

POST /api/scrape/<session_id>/cancel

Cancel an active scraping session and delete cached listings.

Response:

{
  "cancelled": true,
  "message": "Session deleted"
}

GET /api/health

Health check endpoint.

Response:

{
  "status": "ok"
}

GET /api/metrics

Prometheus-compatible metrics endpoint.

Response (text/plain):

# HELP search_requests_total Total number of search requests
# TYPE search_requests_total counter
search_requests_total 42

# HELP scrape_requests_total Total number of scrape requests
# TYPE scrape_requests_total counter
scrape_requests_total 1050

# HELP uptime_seconds Application uptime in seconds
# TYPE uptime_seconds gauge
uptime_seconds 86400

# HELP active_sessions Number of active scraping sessions
# TYPE active_sessions gauge
active_sessions 2

# HELP cache_size Number of cached ZIP codes
# TYPE cache_size gauge
zip_code_cache_size 150

# HELP kleinanzeigen_http_responses_total HTTP responses from kleinanzeigen.de
# TYPE kleinanzeigen_http_responses_total counter
kleinanzeigen_http_responses_total{code="200"} 1000
kleinanzeigen_http_responses_total{code="error"} 5

# HELP nominatim_http_responses_total HTTP responses from Nominatim API
# TYPE nominatim_http_responses_total counter
nominatim_http_responses_total{code="200"} 150

Configuration Options

Server Configuration

  • host: Bind address (default: 0.0.0.0)
  • port: Port number (default: 5000)
  • debug: Debug mode (default: false)

Scraping Configuration

  • session_timeout: Session expiry in seconds (default: 300)
  • listings_per_page: Listings per page on kleinanzeigen.de (default: 25)
  • max_workers: Number of parallel scraping threads (default: 4)
  • min_workers: Number of parallel scraping threads (default: 2)
  • rate_limit_delay: Delay between batches in seconds (default: 0.5)
  • geocoding_delay: Delay between geocoding requests (default: 1.0)

Cache Configuration

  • zip_cache_file: Path to ZIP code cache file (default: zip_cache.json)

Monitoring

View logs:

tail -f /var/log/kleinanzeigenscraper.log

Check service status:

systemctl status kleinanzeigenscraper.service

Monitor metrics (Prometheus):

curl http://localhost:5000/api/metrics

Development

Run in debug mode:

python3 scrape_proxy.py

Frontend files are located in web/:

  • index.html - Main HTML file
  • css/style.css - Stylesheet
  • js/config.js - Configuration
  • js/map.js - Map functions
  • js/ui.js - UI functions
  • js/api.js - API communication
  • js/app.js - Main application

License

This project is provided as-is for educational purposes. Respect kleinanzeigen.de's terms of service and robots.txt when using this tool.

Credits

Built with:

  • Flask (Python web framework)
  • Leaflet.js (Interactive maps)
  • BeautifulSoup4 (HTML parsing)
  • OpenStreetMap & Nominatim (Geocoding)