Skip to main content

Handle Dynamic Pages with Playwright: JavaScript Rendering

Not all websites serve complete HTML on the initial page load. Modern single-page applications (SPAs), infinite-scroll sites, and JavaScript-heavy pages render content dynamically in the browser using frameworks like React, Vue, and Angular. BeautifulSoup can only parse static HTML; it cannot execute JavaScript. This is where Playwright enters the picture. Playwright automates a headless browser, renders JavaScript, waits for dynamic content to appear, and lets you interact with the page as a user would. This article teaches you to deploy Playwright for scraping modern web applications, handle asynchronous loading, and gracefully fall back to simpler approaches when full browser automation is overkill.

I spent weeks trying to scrape a React-based job board using BeautifulSoup before realizing the entire page was rendered client-side. Switching to Playwright and adding a wait for dynamic elements solved it in hours. For modern websites, Playwright is non-negotiable.

Installing and Starting Playwright

Playwright is a cross-platform browser automation framework. Install it and the necessary browser binaries:

# Install via pip
# pip install playwright

# Then download browser binaries (one-time setup)
# python -m playwright install

# Or install a specific browser
# python -m playwright install chromium

from playwright.sync_api import sync_playwright
import time

# Context manager automatically closes browser
with sync_playwright() as p:
# Launch a headless Chromium browser
browser = p.chromium.launch(headless=True)

# Create a new page/tab
page = browser.new_page()

# Navigate to a URL
page.goto("https://example.com")

# Get the page title
print(page.title())

# Get the rendered HTML (after JavaScript execution)
html = page.content()
print(f"Page size: {len(html)} characters")

# Close the page and browser
page.close()
browser.close()

Key points:

  • sync_playwright() is the synchronous API (easier for beginners); async_playwright() is async.
  • headless=True runs the browser without a visible UI (faster).
  • page.goto() loads a URL and waits for the page to be interactive by default.
  • page.content() returns the fully rendered HTML after JavaScript executes.

Waiting for Dynamic Content

JavaScript often loads content asynchronously. Playwright provides multiple waiting strategies:

from playwright.sync_api import sync_playwright
import time

with sync_playwright() as p:
browser = p.chromium.launch(headless=False) # headless=False to see what happens
page = browser.new_page()

page.goto("https://example.com/dynamic-list")

# Strategy 1: Wait for a specific selector to appear
try:
# Wait up to 10 seconds for elements with class "item" to appear
page.wait_for_selector("div.item", timeout=10000)
print("Items loaded!")
except:
print("Items did not load within timeout")

# Strategy 2: Wait for a function to return True
page.wait_for_function(
"() => document.querySelectorAll('div.item').length > 5",
timeout=10000
)
print("At least 5 items are now visible")

# Strategy 3: Wait for navigation (after clicking a link)
page.click("a.next-page")
page.wait_for_navigation()
print("Navigation completed")

# Strategy 4: Wait a fixed time (use sparingly; prefer above strategies)
time.sleep(2)

# Now parse the rendered HTML with BeautifulSoup
from bs4 import BeautifulSoup
html = page.content()
soup = BeautifulSoup(html, "html.parser")

items = soup.select("div.item")
for item in items:
title = item.select_one("h3")
if title:
print(title.get_text(strip=True))

browser.close()

Waiting strategies:

StrategyUse Case
wait_for_selector(selector)Wait for an element to appear
wait_for_function(js_function)Wait for custom JavaScript condition
wait_for_navigation()Wait for page navigation to complete
wait_for_load_state("networkidle")Wait for network to finish
time.sleep(seconds)Fixed delay (use as fallback)

Scraping Infinite Scroll Pages

Pages that load more content as you scroll require scrolling and waiting between loads:

from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup
import time

with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()

page.goto("https://example.com/infinite-scroll")

all_items = []
previous_count = 0

# Scroll multiple times to load more content
for scroll_iteration in range(5):
# Scroll to the bottom
page.evaluate("window.scrollBy(0, document.body.scrollHeight)")

# Wait for new items to load (up to 5 seconds)
page.wait_for_function(
f"() => document.querySelectorAll('div.item').length > {previous_count}",
timeout=5000
)

# Parse the current page content
soup = BeautifulSoup(page.content(), "html.parser")
items = soup.select("div.item")
current_count = len(items)

print(f"Iteration {scroll_iteration + 1}: Found {current_count} items total")
previous_count = current_count

# Small delay before next scroll
time.sleep(1)

# Extract final data
soup = BeautifulSoup(page.content(), "html.parser")
for item in soup.select("div.item"):
title = item.select_one("h3")
if title:
all_items.append(title.get_text(strip=True))

print(f"Total items: {len(all_items)}")
browser.close()

This pattern scrolls, waits for new items, and repeats until no more content loads.

Interacting with Pages: Clicks, Forms, and Input

Playwright can fill forms, click buttons, and trigger interactions:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()

page.goto("https://example.com/search")

# Fill a text input
page.fill("input#search-box", "python web scraping")

# Click a button
page.click("button#search-button")

# Wait for results to load
page.wait_for_selector("div.result", timeout=5000)

# Read the page content
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content(), "html.parser")
results = soup.select("div.result")

print(f"Found {len(results)} search results")

# Select a dropdown option
page.select_option("select#category", "articles")

# Wait for new results
page.wait_for_load_state("networkidle")

# Take a screenshot for debugging
page.screenshot(path="screenshot.png")

browser.close()

Common interactions:

  • page.fill(selector, text) — fill a text input.
  • page.click(selector) — click an element.
  • page.select_option(selector, value) — select dropdown option.
  • page.check(selector) — check a checkbox.
  • page.wait_for_load_state("networkidle") — wait for all network requests.
  • page.screenshot(path) — capture a screenshot.

A Complete Dynamic Scraper Example

Here is a realistic scraper that handles a React-based product listing:

from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup
import json
import time

class DynamicScraper:
def __init__(self, start_url):
self.start_url = start_url
self.data = []

def scrape(self):
with sync_playwright() as p:
# Launch browser
browser = p.chromium.launch(headless=True)
page = browser.new_page()

# Set a user agent
page.set_extra_http_headers({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
})

page.goto(self.start_url, wait_until="networkidle")

# Wait for initial product list
try:
page.wait_for_selector("div.product-card", timeout=10000)
except:
print("Products did not load")
browser.close()
return []

# Scroll and load all products
for iteration in range(10):
# Scroll to bottom
page.evaluate("window.scrollBy(0, document.body.scrollHeight)")

# Wait a bit for content to load
time.sleep(1)

# Check if we can scroll further
can_scroll = page.evaluate("""
() => {
return window.innerHeight + window.scrollY < document.body.offsetHeight;
}
""")

if not can_scroll:
print("Reached the end of the page")
break

# Parse the final HTML
soup = BeautifulSoup(page.content(), "html.parser")

for card in soup.select("div.product-card"):
try:
title = card.select_one("h2").get_text(strip=True)
price = card.select_one("span.price").get_text(strip=True)
link = card.select_one("a").get("href")

self.data.append({
"title": title,
"price": price,
"link": link
})
except AttributeError:
# Missing elements; skip
continue

browser.close()
return self.data

# Usage
scraper = DynamicScraper("https://example.com/products")
products = scraper.scrape()

print(f"Scraped {len(products)} products")

with open("products.json", "w", encoding="utf-8") as f:
json.dump(products, f, indent=2)

Playwright Headless Mode and Performance

Headless browsers are faster but less visible for debugging. You can toggle and take screenshots:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
# headless=False shows the browser window (useful for debugging)
browser = p.chromium.launch(headless=False)
page = browser.new_page()

# Set viewport size (affects responsive design)
page.set_viewport_size({"width": 1920, "height": 1080})

page.goto("https://example.com")

# Take a screenshot for visual inspection
page.screenshot(path="desktop.png")

# Resize for mobile testing
page.set_viewport_size({"width": 375, "height": 812})
page.screenshot(path="mobile.png")

browser.close()

When NOT to Use Playwright

Playwright is powerful but slower than BeautifulSoup. Use it only when necessary:

  • Sites with JavaScript-rendered content: use Playwright.
  • Sites with static HTML and next-page links: use BeautifulSoup.
  • APIs that return JSON: use requests and json.loads().
  • Large-scale scraping (1M+ pages): optimize with BeautifulSoup or APIs first; add Playwright only if needed.

Key Takeaways

  • Playwright automates headless browsers and executes JavaScript, enabling scraping of SPAs and dynamic content.
  • Always wait for elements to load before parsing; use wait_for_selector() or wait_for_function().
  • Infinite-scroll pages require scrolling loops with waits between iterations.
  • Playwright can fill forms, click buttons, and interact with pages like a user.
  • Use headless=False and screenshots for debugging dynamic content issues.

Frequently Asked Questions

What is the performance difference between Playwright and BeautifulSoup?

Playwright is 10-50x slower because it runs a full browser. For a single page, it takes 2-5 seconds; BeautifulSoup takes 0.1-0.5 seconds. Use Playwright only when JavaScript rendering is essential.

Can I use Playwright with multiple browser instances in parallel?

Yes, but carefully. Multiple browsers consume significant memory and CPU. Limit to 2-4 parallel instances and use a connection pool to manage them. For massive parallelism, consider headless browser services like Browserless or ScraperAPI.

How do I handle authentication (login) with Playwright?

Fill and submit login forms using page.fill() and page.click(), then wait for navigation. Optionally, save cookies with context.storage_state() to reuse across requests without re-logging-in.

What if a page has infinite scroll that never ends?

Set a maximum iteration count or monitor the number of new items loaded. If previous_count == current_count for two iterations, stop scrolling (no new content loaded).

Can I run Playwright on a headless server without a display?

Yes. Playwright works on servers and CI/CD systems without X11. Use headless=True (the default). Ensure browser binaries are installed: python -m playwright install.

Further Reading