2025-11-26 18:37:09 +01:00
2025-11-26 18:37:09 +01:00
2025-11-26 11:17:12 +01:00
2025-11-26 11:30:02 +01:00
2025-11-25 16:41:52 +01:00
2025-11-24 20:52:58 +01:00
2025-11-25 23:04:41 +01:00

Kleinanzeigen Boosted

A web-based map visualization tool for searching and exploring listings from kleinanzeigen.de with real-time geographic display on OpenStreetMap.

Features

  • 🗺️ Interactive map visualization
  • 🔍 Advanced search with price range (more options in future)
  • 📍 Automatic geocoding of listings via Nominatim API
  • Parallel scraping with concurrent workers
  • 📊 Prometheus-compatible metrics endpoint
  • 🎯 Real-time progress tracking with ETA
  • 💾 ZIP code caching to minimize API calls
  • 🌐 User location display on map

Architecture

  • Backend: Flask API server with multi-threaded scraping
  • Frontend: Vanilla JavaScript with Leaflet.js for maps
  • Data Sources: kleinanzeigen.de, OpenStreetMap/Nominatim

Installation

1. Create System User

mkdir -p /home/kleinanzeigenscraper/
useradd --system -K MAIL_DIR=/dev/null kleinanzeigenscraper -d /home/kleinanzeigenscraper
chown -R kleinanzeigenscraper:kleinanzeigenscraper /home/kleinanzeigenscraper

2. Clone Repository

cd /home/kleinanzeigenscraper/
mkdir git
cd git
git clone https://git.mosad.xyz/localhorst/kleinanzeigen-boosted.git
cd kleinanzeigen-boosted
git checkout main

3. Install Dependencies

pip install flask flask-cors beautifulsoup4 lxml urllib3 requests

or via zypper

zypper install python313-Flask python313-Flask-Cors python313-beautifulsoup4 python313-lxml python313-urllib3 python313-requests

4. Configure Application

Create/modify config.json.

5. Create Systemd Service

Create /lib/systemd/system/kleinanzeigenscraper.service:

[Unit]
Description=Kleinanzeigen Scraper API
After=network.target systemd-networkd-wait-online.service

[Service]
Type=simple
User=kleinanzeigenscraper
WorkingDirectory=/home/kleinanzeigenscraper/git/kleinanzeigen-boosted/backend/
ExecStart=/usr/bin/python3 scrape_proxy.py
Restart=on-failure
RestartSec=10
StandardOutput=append:/var/log/kleinanzeigenscraper.log
StandardError=append:/var/log/kleinanzeigenscraper.log

[Install]
WantedBy=multi-user.target

6. Enable and Start Service

systemctl daemon-reload
systemctl enable kleinanzeigenscraper.service
systemctl start kleinanzeigenscraper.service
systemctl status kleinanzeigenscraper.service

7. Configure nginx Reverse Proxy

Create /etc/nginx/sites-available/kleinanzeigenscraper:

server {
    listen 80;
    server_name your-domain.com;

    # Redirect HTTP to HTTPS
    return 301 https://$server_name$request_uri;
}

server {
    listen 443 ssl;
    http2 on;
    server_name your-domain.com;

    ssl_certificate /path/to/ssl/cert.pem;
    ssl_certificate_key /path/to/ssl/key.pem;

    # Security headers
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-XSS-Protection "1; mode=block" always;

  	location / {
        client_max_body_size 1G;
        proxy_buffering off;

        #Path to the root of your installation
        root /home/kleinanzeigenscraper/git/kleinanzeigen-boosted/web/;
        index index.html;
	}
  
  location /api/ {
        proxy_pass        http://127.0.0.1:27979;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_read_timeout 300;
    }
}

Enable site:

ln -s /etc/nginx/sites-available/kleinanzeigenscraper /etc/nginx/sites-enabled/
nginx -t
systemctl reload nginx

API Endpoints

POST /api/search

Start a new search session.

Request Body:

{
  "search_term": "Fahrrad",
  "num_listings": 25,
  "min_price": 0,
  "max_price": 1000
}

Response:

{
  "session_id": "uuid-string",
  "total": 25
}

GET /api/scrape/<session_id>

Get the next scraped listing from an active session.

Response:

{
  "complete": false,
  "listing": {
    "title": "Mountain Bike",
    "price": 450,
    "id": 123456,
    "zip_code": "76593",
    "address": "Gernsbach",
    "date_added": "2025-11-20",
    "image": "https://...",
    "url": "https://...",
    "lat": 48.7634,
    "lon": 8.3344
  },
  "progress": {
    "current": 5,
    "total": 25
  }
}

POST /api/scrape/<session_id>/cancel

Cancel an active scraping session and delete cached listings.

Response:

{
  "cancelled": true,
  "message": "Session deleted"
}

GET /api/health

Health check endpoint.

Response:

{
  "status": "ok"
}

GET /api/metrics

Prometheus-compatible metrics endpoint.

Response (text/plain):

# HELP search_requests_total Total number of search requests
# TYPE search_requests_total counter
search_requests_total 42

# HELP scrape_requests_total Total number of scrape requests
# TYPE scrape_requests_total counter
scrape_requests_total 1050

# HELP uptime_seconds Application uptime in seconds
# TYPE uptime_seconds gauge
uptime_seconds 86400

# HELP active_sessions Number of active scraping sessions
# TYPE active_sessions gauge
active_sessions 2

# HELP cache_size Number of cached ZIP codes
# TYPE cache_size gauge
zip_code_cache_size 150

# HELP kleinanzeigen_http_responses_total HTTP responses from kleinanzeigen.de
# TYPE kleinanzeigen_http_responses_total counter
kleinanzeigen_http_responses_total{code="200"} 1000
kleinanzeigen_http_responses_total{code="error"} 5

# HELP nominatim_http_responses_total HTTP responses from Nominatim API
# TYPE nominatim_http_responses_total counter
nominatim_http_responses_total{code="200"} 150

Configuration Options

Server Configuration

  • host: Bind address (default: 0.0.0.0)
  • port: Port number (default: 5000)
  • debug: Debug mode (default: false)

Scraping Configuration

  • session_timeout: Session expiry in seconds (default: 300)
  • listings_per_page: Listings per page on kleinanzeigen.de (default: 25)
  • max_workers: Number of parallel scraping threads (default: 4)
  • min_workers: Number of parallel scraping threads (default: 2)
  • rate_limit_delay: Delay between batches in seconds (default: 0.5)
  • geocoding_delay: Delay between geocoding requests (default: 1.0)

Cache Configuration

  • zip_cache_file: Path to ZIP code cache file (default: zip_cache.json)

Monitoring

View logs:

tail -f /var/log/kleinanzeigenscraper.log

Check service status:

systemctl status kleinanzeigenscraper.service

Monitor metrics (Prometheus):

curl http://localhost:5000/api/metrics

Development

Run in debug mode:

python3 scrape_proxy.py

Frontend files are located in web/:

  • index.html - Main HTML file
  • css/style.css - Stylesheet
  • js/config.js - Configuration
  • js/map.js - Map functions
  • js/ui.js - UI functions
  • js/api.js - API communication
  • js/app.js - Main application

License

This project is provided as-is for educational purposes. Respect kleinanzeigen.de's terms of service and robots.txt when using this tool.

Credits

Built with:

  • Flask (Python web framework)
  • Leaflet.js (Interactive maps)
  • BeautifulSoup4 (HTML parsing)
  • OpenStreetMap & Nominatim (Geocoding)
Description
Scrape kleinanzeigen.de
https://kleinanzeigen.mosad.xyz/
Readme 165 KiB
Languages
Python 34.4%
JavaScript 30.1%
CSS 23.6%
HTML 11.3%
Shell 0.6%