Skip to contents

Fetch many Facebook-Ad-Library pages (vectorised, cached, parallel)

Usage

get_ad_html(
  ad_ids,
  country,
  cache_dir = NULL,
  overwrite = FALSE,
  strip_css = TRUE,
  max_active = 8,
  log_failed_ids = NULL,
  ua = NULL,
  randomize_ua = NULL,
  interactive = FALSE,
  timeout_sec = 15,
  retries = 3L,
  quiet = FALSE,
  return_type = c("paths", "list")
)

Arguments

ad_ids

Character vector of Ad-Library IDs.

country

Two-letter country code.

cache_dir

Directory where .html.gz files will be stored. Defaults to the value set during interactive setup, or "html_cache".

overwrite

If FALSE (default) keep already-cached files.

strip_css

Run the same fast, regex-based CSS removal as the single-ID helper only on newly-downloaded pages.

max_active

Maximum number of concurrent sockets passed to httr2::req_perform_parallel() (default = 8).

log_failed_ids

If a character path is provided (e.g., "log.txt"), a log of failed IDs will be written to that file. Default is NULL (no log file).

ua

User-Agent string. If NULL (default), uses a standard or randomized UA based on randomize_ua.

randomize_ua

Boolean. If TRUE, a random User-Agent is chosen from a predefined list for each request to make it harder to track. Defaults to the value set during interactive setup, or FALSE.

interactive

If TRUE, run a one-time interactive setup to configure and save default settings. Default is FALSE.

timeout_sec,

retries Passed through to the underlying requests.

quiet

Suppress progress messages.

return_type

"paths" (default) or "list" for in-memory strings.

Value

Either a named character vector of file paths or a named list of HTML strings, in the same order as ad_ids.