Community • Python • Playwright

Booking.com Full-Year Price Scraper — Deep Guide

A full, practical handbook for the Booking.com Full-Year Price Scraper (Apify Actor). This page covers the Actor’s purpose, how it navigates Booking.com calendars, configuration examples, deployment notes, troubleshooting, and ethical usage. Use it as documentation, onboarding material, or a launchpad for customization.

Pricing: $10/month + usage Exports: Excel, Apify dataset Author: Jamshaid Arif (community)

At a glance

  • Extracts hotel prices across months by automating calendar navigation.
  • Outputs a timestamped Excel file and pushes records to an Apify dataset.
  • Handles retries, price normalization, and export formatting.
Open Actor View Input

Overview

This Actor is tailored to collect price and availability data from Booking.com hotel detail pages over an extended period (commonly 12 months). It is especially useful for pricing research, competitive intelligence, travel industry analytics, and revenue management teams who need historical or forward-looking price snapshots.

Why a year? A full-year sweep reveals seasonality and trends — perfect for forecasting and historical comparisons.

What it scrapes

Key features

Calendar Navigation

Automatically opens the hotel's calendar UI and traverses months forward, collecting price points for each date combination you request.

Excel Export & Dataset

Results are written to an Excel workbook named with a timestamp. Simultaneously, records are pushed to an Apify dataset for programmatic access.

Resilience & Retries

Built-in retries and wait-for selectors ensure the Actor recovers from transient timeouts or slow page loads.

Price Normalization

Optional numeric conversion strips symbols and normalizes comma/dot decimal differences. Useful for downstream analysis and charts.

How the Actor works (pipeline)

[Input: startUrls, monthsForward, formatPrices, maxRetries] 
  ↓
For each URL:
  • open page (Playwright)
  • locate calendar / availability widget
  • iterate months (1..monthsForward)
  • extract visible prices for target dates
  • normalize (optional)
  • append record with source_url + scraped_at
After loop:
  • write Excel workbook (timestamped)
  • push records to Apify dataset
  • finish run with summary & attachments
      

Important implementation notes

Input JSON & configuration

Use the input object to control the run. Example below is the recommended starting point for a single-year sweep.

{
  "startUrls": [
    "https://www.booking.com/hotel/xx/example-hotel-1.html",
    "https://www.booking.com/hotel/xx/example-hotel-2.html"
  ],
  "monthsForward": 12,
  "formatPrices": true,
  "maxRetries": 3,
  "datasetName": "booking_yearly_prices"
}

Field details

Tip: start small (1–3 URLs) to validate selectors and locale behavior before scaling up.

Examples & snippets

Playwright + Python (pseudo-snippet)

from playwright.sync_api import sync_playwright

def extract_prices(page):
    # Wait for calendar widget and iterate months
    page.wait_for_selector('.bui-calendar')
    for i in range(months):
        # click next month button
        page.click('.bui-calendar__control--next')
        # parse visible price cells
        # ... extract date and price

Price formatter (Python)

import re
def to_price_num(txt):
    if not txt: return None
    s = re.sub(r'[^0-9,.-]', '', txt)
    if ',' in s and '.' in s and s.find(',') < s.find('.'):
        s = s.replace(',', '')
    elif ',' in s and '.' not in s:
        s = s.replace(',', '.')
    try:
        return float(s)
    except:
        return None

Sample output (CSV/Excel row)

hotel_name,check_in,check_out,price_numeric,currency,source_url,scraped_at
Example Hotel,2025-01-01,2025-01-02,125,EUR,https://www.booking.com/hotel/xx/example-hotel-1.html,2025-08-13T10:22:00Z
Store source_url and scraped_at for provenance — it helps debugging and compliance.

Deploying & running on Apify

  1. Open the Actor page on Apify Store and click Try for free.
  2. Paste the input JSON, adjust parameters, and start a run.
  3. Watch logs — Playwright actions, navigation steps, and any retries are logged.
  4. Download the generated Excel from run attachments and access the dataset via API.

Scheduling & automation

Schedule this Actor weekly or monthly to keep a rolling dataset. Use Apify webhooks to trigger downstream ETL or reporting tasks when a run completes.

Troubleshooting & best practices

Ethics, legal & respectful scraping

Always respect Booking.com's Terms of Service and robots.txt where applicable. Use scraped data for legitimate business purposes. Don’t republish or redistribute data in ways that violate site rules or copyrights.

Recommended

  • Throttle requests and mimic human-like interaction.
  • Keep runs small and test changes carefully.
  • Store provenance and make your intent transparent in automated user-agents.

Avoid

  • Abusive scraping (high request rates, mass crawling without permission).
  • Attempting to bypass authentication, paywalls, or anti-bot measures.

In-browser extraction demo (client-side)

Tip: many sites block cross-origin requests — if fetch fails due to CORS paste page HTML into the box instead.

Email/PriceSource
No results yet.

FAQ

Is this Actor official?

It is a community-maintained Actor by Jamshaid Arif and others. Check the repository and changelog for updates before production use.

How to handle multiple currencies?

Collect currency strings and use FX rates in downstream processing to compare across markets.

Why Excel + dataset?

Excel is human-friendly; the dataset is programmatic. Both together support analysts and engineers.