Fetch many Facebook-Ad-Library pages (vectorised, cached, parallel)
Source:R/get_ad_html.R
get_ad_html.Rd
Fetch many Facebook-Ad-Library pages (vectorised, cached, parallel)
Usage
get_ad_html(
ad_ids,
country,
cache_dir = NULL,
overwrite = FALSE,
strip_css = TRUE,
max_active = 8,
log_failed_ids = NULL,
ua = NULL,
randomize_ua = NULL,
interactive = FALSE,
timeout_sec = 15,
retries = 3L,
quiet = FALSE,
return_type = c("paths", "list")
)
Arguments
- ad_ids
Character vector of Ad-Library IDs.
- country
Two-letter country code.
- cache_dir
Directory where .html.gz files will be stored. Defaults to the value set during interactive setup, or "html_cache".
- overwrite
If FALSE (default) keep already-cached files.
- strip_css
Run the same fast, regex-based CSS removal as the single-ID helper only on newly-downloaded pages.
- max_active
Maximum number of concurrent sockets passed to
httr2::req_perform_parallel()
(default = 8).- log_failed_ids
If a character path is provided (e.g., "log.txt"), a log of failed IDs will be written to that file. Default is NULL (no log file).
- ua
User-Agent string. If NULL (default), uses a standard or randomized UA based on
randomize_ua
.- randomize_ua
Boolean. If TRUE, a random User-Agent is chosen from a predefined list for each request to make it harder to track. Defaults to the value set during interactive setup, or FALSE.
- interactive
If TRUE, run a one-time interactive setup to configure and save default settings. Default is FALSE.
- timeout_sec,
retries Passed through to the underlying requests.
- quiet
Suppress progress messages.
- return_type
"paths"
(default) or"list"
for in-memory strings.