Creating Density Plots with viz_density()
density_vignette.Rmdπ Introduction
The viz_density() function creates kernel density
estimate plots, which provide a smoothed visualization of the
distribution of a continuous variable. Unlike histograms that use
discrete bins, density plots show a continuous estimate of the
probability density function.
Density plots are particularly useful for: - Visualizing distribution shapes without binning artifacts - Comparing multiple distributions on the same plot - Identifying multimodal distributions
π Basic Density Plot
Create a simple density plot showing the distribution of age:
plot <- viz_density(
data = gss,
x_var = "age",
title = "Age Distribution",
x_label = "Age (years)"
)
plotπ Grouped Densities
Compare distributions across groups by adding a
group_var:
plot <- viz_density(
data = gss,
x_var = "age",
group_var = "sex",
title = "Age Distribution by Sex",
x_label = "Age (years)",
color_palette = c("#3498DB", "#E74C3C")
)
plotπ¨ Customizing Appearance
Adjusting Fill Opacity
Control the transparency of the filled area with
fill_opacity (0 = transparent, 1 = opaque):
plot <- viz_density(
data = gss,
x_var = "age",
title = "Age Distribution (Higher Opacity)",
fill_opacity = 0.6
)
plotAdding Rug Marks
Rug marks show individual data points along the x-axis:
gss_sample <- gss %>% sample_n(min(200, n()))
plot <- viz_density(
data = gss_sample,
x_var = "age",
title = "Age Distribution with Rug Marks",
show_rug = TRUE
)
plotβοΈ Controlling Bandwidth
The bandwidth parameter controls how smooth the density
estimate is. Lower values = more detail, higher values = smoother.
Low Bandwidth (More Detail)
plot <- viz_density(
data = gss,
x_var = "age",
bandwidth = 2,
title = "Age Distribution (Bandwidth = 2)"
)
plotHigh Bandwidth (Smoother)
plot <- viz_density(
data = gss,
x_var = "age",
bandwidth = 10,
title = "Age Distribution (Bandwidth = 10)"
)
plotπ Multiple Group Comparison
Compare age distributions across race categories:
# Filter to categories with sufficient data
gss_race <- gss %>%
filter(!is.na(race)) %>%
mutate(race = as.character(haven::as_factor(race)))
plot <- viz_density(
data = gss_race,
x_var = "age",
group_var = "race",
title = "Age Distribution by Race",
x_label = "Age (years)",
fill_opacity = 0.3
)
plotπ Handling Missing Groups
Use include_na = TRUE to show NA values as an explicit
category:
# Create some NAs for demonstration
gss_with_na <- gss %>%
mutate(sex_with_na = if_else(row_number() %% 10 == 0, NA_character_, as.character(haven::as_factor(sex))))
plot <- viz_density(
data = gss_with_na,
x_var = "age",
group_var = "sex_with_na",
title = "Age Distribution by Sex (Including Missing)",
include_na = TRUE,
na_label = "Not Reported"
)
plotπ’ Custom Group Ordering
Control the order of groups in the legend:
gss_degree <- gss %>%
filter(!is.na(degree)) %>%
mutate(degree = as.character(haven::as_factor(degree)))
plot <- viz_density(
data = gss_degree,
x_var = "age",
group_var = "degree",
title = "Age Distribution by Education",
group_order = c("graduate", "bachelor", "junior college", "high school", "lt high school")
)
plotπ Summary
The viz_density() function provides a flexible way to
visualize continuous distributions with these key features:
-
Basic density: Just specify
dataandx_var -
Grouped comparison: Add
group_varto compare distributions -
Appearance control: Use
fill_opacity,color_palette, andshow_rug -
Smoothness: Adjust
bandwidthto control detail level -
Missing values: Handle with
include_naandna_label -
Ordering: Control group order with
group_order