Creating Box Plots with viz_boxplot()
boxplot_vignette.Rmdπ Introduction
The viz_boxplot() function creates interactive box plots
(also known as box-and-whisker plots) using highcharter. Box plots
display the five-number summary of a distribution: minimum, first
quartile (Q1), median, third quartile (Q3), and maximum, along with
outliers.
Box plots are particularly useful for: - Comparing distributions across groups - Identifying outliers - Visualizing the spread and skewness of data
π Basic Box Plot
Create a simple box plot showing the overall distribution of age:
plot <- viz_boxplot(
data = gss,
y_var = "age",
title = "Age Distribution",
y_label = "Age (years)"
)
plotπ Grouped Box Plots
Compare distributions across categories by adding an
x_var:
gss_sex <- gss %>%
filter(!is.na(sex)) %>%
mutate(sex = as.character(haven::as_factor(sex)))
plot <- viz_boxplot(
data = gss_sex,
y_var = "age",
x_var = "sex",
title = "Age Distribution by Sex",
x_label = "Sex",
y_label = "Age (years)"
)
plotπ Box Plot by Education Level
Examine how age varies across education levels:
gss_degree <- gss %>%
filter(!is.na(degree)) %>%
mutate(degree = as.character(haven::as_factor(degree)))
plot <- viz_boxplot(
data = gss_degree,
y_var = "age",
x_var = "degree",
title = "Age Distribution by Education",
x_label = "Highest Degree",
y_label = "Age (years)"
)
plotβοΈ Controlling Outlier Display
By default, outliers are shown as individual points. Use
show_outliers = FALSE to hide them:
plot <- viz_boxplot(
data = gss_sex,
y_var = "age",
x_var = "sex",
title = "Age by Sex (No Outliers)",
show_outliers = FALSE
)
plotβοΈοΈ Horizontal Box Plots
Flip the orientation for better readability with many categories:
plot <- viz_boxplot(
data = gss_degree,
y_var = "age",
x_var = "degree",
title = "Age by Education (Horizontal)",
horizontal = TRUE
)
plotπ·οΈ Custom Category Labels
Use x_map_values to rename category labels:
gss_sex_raw <- gss %>%
filter(!is.na(sex))
plot <- viz_boxplot(
data = gss_sex_raw,
y_var = "age",
x_var = "sex",
title = "Age by Sex",
x_map_values = list("1" = "Male", "2" = "Female")
)
plotπ’ Custom Category Order
Control the order of categories with x_order:
education_order <- c("graduate", "bachelor", "junior college", "high school", "lt high school")
plot <- viz_boxplot(
data = gss_degree,
y_var = "age",
x_var = "degree",
title = "Age by Education (Ordered)",
x_order = education_order
)
plotπ¨ Custom Color Palette
Apply custom colors to the boxes:
plot <- viz_boxplot(
data = gss_sex,
y_var = "age",
x_var = "sex",
title = "Age by Sex",
color_palette = c("#3498DB", "#E74C3C")
)
plotπ Handling Missing Values
Include NA as an explicit category:
gss_with_na <- gss %>%
mutate(sex_with_na = if_else(row_number() %% 10 == 0, NA_character_, as.character(haven::as_factor(sex))))
plot <- viz_boxplot(
data = gss_with_na,
y_var = "age",
x_var = "sex_with_na",
title = "Age by Sex (Including Missing)",
include_na = TRUE,
na_label = "Not Reported"
)
plotπ Comparing Multiple Groups
Box plots excel at comparing distributions across many groups:
gss_race <- gss %>%
filter(!is.na(race)) %>%
mutate(race = as.character(haven::as_factor(race)))
plot <- viz_boxplot(
data = gss_race,
y_var = "age",
x_var = "race",
title = "Age Distribution by Race",
x_label = "Race",
y_label = "Age (years)"
)
plotπ Summary
The viz_boxplot() function provides a powerful way to
visualize distributions with these key features:
-
Basic boxplot: Just specify
dataandy_var -
Grouped comparison: Add
x_varto compare across categories -
Outliers: Control display with
show_outliers -
Orientation: Use
horizontal = TRUEfor horizontal boxes -
Labels: Customize with
x_map_valuesandx_order -
Missing values: Handle with
include_naandna_label -
Styling: Apply custom colors with
color_palette