Fetch many Facebook-Ad-Library pages (vectorised, cached, parallel)

Usage

get_ad_html(
  ad_ids,
  country,
  cache_dir = NULL,
  overwrite = FALSE,
  strip_css = TRUE,
  max_active = 8,
  log_failed_ids = NULL,
  ua = NULL,
  randomize_ua = NULL,
  interactive = FALSE,
  timeout_sec = 15,
  retries = 3L,
  quiet = FALSE,
  return_type = c("paths", "list")
)

Arguments

ad_ids: Character vector of Ad-Library IDs.
country: Two-letter country code.
cache_dir: Directory where .html.gz files will be stored. Defaults to the value set during interactive setup, or "html_cache".
overwrite: If FALSE (default) keep already-cached files.
strip_css: Run the same fast, regex-based CSS removal as the single-ID helper only on newly-downloaded pages.
max_active: Maximum number of concurrent sockets passed to httr2::req_perform_parallel() (default = 8).
log_failed_ids: If a character path is provided (e.g., "log.txt"), a log of failed IDs will be written to that file. Default is NULL (no log file).
ua: User-Agent string. If NULL (default), uses a standard or randomized UA based on randomize_ua.
randomize_ua: Boolean. If TRUE, a random User-Agent is chosen from a predefined list for each request to make it harder to track. Defaults to the value set during interactive setup, or FALSE.
interactive: If TRUE, run a one-time interactive setup to configure and save default settings. Default is FALSE.
timeout_sec,: retries Passed through to the underlying requests.
quiet: Suppress progress messages.
return_type: "paths" (default) or "list" for in-memory strings.

Value

Either a named character vector of file paths or a named list of HTML strings, in the same order as ad_ids.