Booking.com Full-Year Price Scraper — Deep Guide
A full, practical handbook for the Booking.com Full-Year Price Scraper (Apify Actor). This page covers the Actor’s purpose, how it navigates Booking.com calendars, configuration examples, deployment notes, troubleshooting, and ethical usage. Use it as documentation, onboarding material, or a launchpad for customization.
At a glance
- Extracts hotel prices across months by automating calendar navigation.
- Outputs a timestamped Excel file and pushes records to an Apify dataset.
- Handles retries, price normalization, and export formatting.
Overview
This Actor is tailored to collect price and availability data from Booking.com hotel detail pages over an extended period (commonly 12 months). It is especially useful for pricing research, competitive intelligence, travel industry analytics, and revenue management teams who need historical or forward-looking price snapshots.
What it scrapes
- Hotel name and canonical URL
- Check-in and check-out dates (per row)
- Price value (optionally formatted to numeric)
- Currency (extracted when available)
- Scrape timestamp and source metadata
Key features
Calendar Navigation
Automatically opens the hotel's calendar UI and traverses months forward, collecting price points for each date combination you request.
Excel Export & Dataset
Results are written to an Excel workbook named with a timestamp. Simultaneously, records are pushed to an Apify dataset for programmatic access.
Resilience & Retries
Built-in retries and wait-for selectors ensure the Actor recovers from transient timeouts or slow page loads.
Price Normalization
Optional numeric conversion strips symbols and normalizes comma/dot decimal differences. Useful for downstream analysis and charts.
How the Actor works (pipeline)
[Input: startUrls, monthsForward, formatPrices, maxRetries]
↓
For each URL:
• open page (Playwright)
• locate calendar / availability widget
• iterate months (1..monthsForward)
• extract visible prices for target dates
• normalize (optional)
• append record with source_url + scraped_at
After loop:
• write Excel workbook (timestamped)
• push records to Apify dataset
• finish run with summary & attachments
Important implementation notes
- Prefer hotel *detail* pages (not search results) — they have more stable selectors.
- Use explicit waits for calendar elements; avoid blind sleeps.
- When sites change, update selectors; small pilot runs help catch locale differences.
Input JSON & configuration
Use the input object to control the run. Example below is the recommended starting point for a single-year sweep.
{
"startUrls": [
"https://www.booking.com/hotel/xx/example-hotel-1.html",
"https://www.booking.com/hotel/xx/example-hotel-2.html"
],
"monthsForward": 12,
"formatPrices": true,
"maxRetries": 3,
"datasetName": "booking_yearly_prices"
}
Field details
startUrls— array of hotel detail page URLs (required).monthsForward— number of months to traverse from the current month (default 12).formatPrices— if true, strip currency symbols and return numeric price values.maxRetries— how many attempts per navigation/extract step.datasetName— optional dataset name to push results into.
Examples & snippets
Playwright + Python (pseudo-snippet)
from playwright.sync_api import sync_playwright
def extract_prices(page):
# Wait for calendar widget and iterate months
page.wait_for_selector('.bui-calendar')
for i in range(months):
# click next month button
page.click('.bui-calendar__control--next')
# parse visible price cells
# ... extract date and price
Price formatter (Python)
import re
def to_price_num(txt):
if not txt: return None
s = re.sub(r'[^0-9,.-]', '', txt)
if ',' in s and '.' in s and s.find(',') < s.find('.'):
s = s.replace(',', '')
elif ',' in s and '.' not in s:
s = s.replace(',', '.')
try:
return float(s)
except:
return None
Sample output (CSV/Excel row)
hotel_name,check_in,check_out,price_numeric,currency,source_url,scraped_at
Example Hotel,2025-01-01,2025-01-02,125,EUR,https://www.booking.com/hotel/xx/example-hotel-1.html,2025-08-13T10:22:00Z
source_url and scraped_at for provenance — it helps debugging and compliance.Deploying & running on Apify
- Open the Actor page on Apify Store and click Try for free.
- Paste the input JSON, adjust parameters, and start a run.
- Watch logs — Playwright actions, navigation steps, and any retries are logged.
- Download the generated Excel from run attachments and access the dataset via API.
Scheduling & automation
Schedule this Actor weekly or monthly to keep a rolling dataset. Use Apify webhooks to trigger downstream ETL or reporting tasks when a run completes.
Troubleshooting & best practices
- Selector drift: Booking.com changes UI selectively by region; run a pilot and update selectors if elements are missing.
- Rate-limits & blocks: Use Apify proxy or platform anti-blocking options when scaling; still keep delays reasonable.
- Locale differences: Dates and price formats vary — include locale-aware parsing if you process many countries.
- Flaky runs: Increase
maxRetriesand add targeted waits for calendar animations.
Ethics, legal & respectful scraping
Always respect Booking.com's Terms of Service and robots.txt where applicable. Use scraped data for legitimate business purposes. Don’t republish or redistribute data in ways that violate site rules or copyrights.
Recommended
- Throttle requests and mimic human-like interaction.
- Keep runs small and test changes carefully.
- Store provenance and make your intent transparent in automated user-agents.
Avoid
- Abusive scraping (high request rates, mass crawling without permission).
- Attempting to bypass authentication, paywalls, or anti-bot measures.
In-browser extraction demo (client-side)
Tip: many sites block cross-origin requests — if fetch fails due to CORS paste page HTML into the box instead.
| Email/Price | Source |
|---|---|
| No results yet. | |
FAQ
It is a community-maintained Actor by Jamshaid Arif and others. Check the repository and changelog for updates before production use.
Collect currency strings and use FX rates in downstream processing to compare across markets.
Excel is human-friendly; the dataset is programmatic. Both together support analysts and engineers.