Creating Interactive Stacked Bar Charts with `create_stackedbar`
Alexandra Pafford
2025-08-03
stackedbar_vignette.RmdIntroduction
Welcome to this comprehensive guide on using the
create_stackedbar function from the dashboardr
package. This function is designed to create highly customizable
interactive stacked bar charts from survey data, making it particularly
valuable for communication science researchers and other social
scientists working with categorical data.
Stacked bar charts are excellent for visualizing the distribution of categorical responses across different groups or demographics. They allow you to see both the overall patterns and the composition within each category, making them ideal for displaying survey responses, demographic breakdowns, and attitude distributions.
The create_stackedbar function handles many common data
preparation tasks automatically, including:
- Converting
haven_labelledcolumns (from SPSS imports) to R factors - Mapping raw values to descriptive labels
- Binning continuous variables into meaningful categories
- Handling missing values explicitly or implicitly
- Creating both count-based and percentage-based visualizations
- Customizing colors, ordering, and interactive tooltips
This vignette demonstrates the function’s capabilities using the
General Social Survey (GSS) Panel 2020 dataset, focusing on the 2016
wave (_1a variables).
Data Preparation
Let’s prepare our working dataset using the 2020 wave variables.
# Create a working dataset with key _1a variables from 2020
gss_clean <- gss_panel20 %>%
select(
# Demographics
age_1a, sex_1a, race_1a, degree_1a, region_1a,
# Attitudes and behaviors
happy_1a, trust_1a, fair_1a, helpful_1a,
polviews_1a, partyid_1a, attend_1a,
# Economic
income_1a, class_1a
) %>%
# Remove completely empty rows
filter(if_any(everything(), ~ !is.na(.)))
# Check the data structure
glimpse(gss_clean)
#> Rows: 2,867
#> Columns: 14
#> $ age_1a <dbl+lbl> 47, 61, 72, 43, 55, 53, 50, 23, 45, 71, 33, 86, 32, 60…
#> $ sex_1a <dbl+lbl> 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, …
#> $ race_1a <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 3, 2, 1, 2, 1, 2, 2, 1, 1, 1, 3, …
#> $ degree_1a <dbl+lbl> 3, 1, 3, 1, 4, 2, 1, 1, 1, 2, 1, 1, 1, 1, 0, 1, 1, 0, …
#> $ region_1a <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, …
#> $ happy_1a <dbl+lbl> 2, 2, 1, 2, 1, 1, 2, 1…
#> $ trust_1a <dbl+lbl> NA(i), 3, 1, NA(i), 1, 1, NA(i), 2…
#> $ fair_1a <dbl+lbl> NA(i), 1, 1, NA(i), 2, 2, NA(i), 1…
#> $ helpful_1a <dbl+lbl> NA(i), 2, 2, NA(i), 3, 1, NA(i), 1…
#> $ polviews_1a <dbl+lbl> 4, 2, 6, 4, 3, 3, 3, 5…
#> $ partyid_1a <dbl+lbl> 3, 2, 5, 5, 1, 1, 5, 2…
#> $ attend_1a <dbl+lbl> 0, 0, 7, 6, 0, 0, 1, 5, 6, 0, 5, 3, 5, 6, 1, 8, 8, 8, …
#> $ income_1a <dbl+lbl> NA(n), 12, 12, NA(n), NA(n), 12, NA(n), 12…
#> $ class_1a <dbl+lbl> 3, NA(d), 3, 3, 3, 3, 3, 2…
# Examine some key variables
table(gss_clean$degree_1a, useNA = "always")
#>
#> 0 1 2 3 4 <NA>
#> 328 1461 216 536 318 8
table(gss_clean$happy_1a, useNA = "always")
#>
#> 1 2 3 <NA>
#> 806 1601 452 8Basic Stacked Bar Charts
Example 1: Education by Gender (Count-based)
Let’s start with a basic stacked bar chart showing educational attainment by gender.
## TODO: there is a odd 5 chart
# Create basic stacked bar chart
plot1 <- create_stackedbar(
data = gss_clean,
x_var = "degree_1a",
stack_var = "sex_1a",
title = "Educational Attainment by Gender",
subtitle = "GSS Panel 2016 - Raw counts",
x_label = "Highest Degree Completed",
y_label = "Number of Respondents",
stack_label = "Gender",
stacked_type = "counts"
)
plot1Example 2: Happiness Distribution (Percentage-based)
Now let’s create a percentage-based stacked bar chart to show happiness distribution across education levels.
## TODO: there is also an odd 5 chart
# Define education order for logical display
education_order <- c("Lt High School", "High School", "Junior College", "Bachelor", "Graduate")
# Create percentage stacked bar chart
plot2 <- create_stackedbar(
data = gss_clean,
x_var = "degree_1a",
stack_var = "happy_1a",
title = "Happiness Distribution Across Education Levels",
subtitle = "Percentage breakdown within each education category",
x_label = "Education Level",
y_label = "Percentage of Respondents",
stack_label = "Happiness Level",
stacked_type = "percent",
x_order = education_order,
stack_order = c("Very Happy", "Pretty Happy", "Not Too Happy"),
tooltip_suffix = "%",
color_palette = c("#2E86AB", "#A23B72", "#F18F01")
)
plot2Advanced Features
Example 3: Age Binning with Political Views
Let’s demonstrate binning continuous variables by creating age groups and examining political views.
# First, let's clean and prepare the age variable
gss_clean_age <- gss_clean %>%
# Ensure age is numeric and remove missing values for this analysis
filter(!is.na(age_1a), !is.na(polviews_1a)) %>%
mutate(
# Convert age to numeric if it isn't already
age_numeric = as.numeric(age_1a)
)
# Check the cleaned data
cat("Cleaned age summary:\n")
#> Cleaned age summary:
summary(gss_clean_age$age_numeric)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 18.00 34.00 50.00 49.26 62.00 89.00
# Define age breaks and labels (adjusted if needed based on actual data range)
age_range <- range(gss_clean_age$age_numeric, na.rm = TRUE)
cat("Age range in data:", age_range[1], "to", age_range[2], "\n")
#> Age range in data: 18 to 89
# Adjust breaks to match actual data range
age_breaks <- c(18, 30, 45, 60, 75, Inf)
age_labels <- c("18-29", "30-44", "45-59", "60-74", "75+")
# Map political views to shorter labels
polviews_map <- list(
"Extremely Liberal" = "Ext Liberal",
"Liberal" = "Liberal",
"Slightly Liberal" = "Sl Liberal",
"Moderate" = "Moderate",
"Slightly Conservative" = "Sl Conservative",
"Conservative" = "Conservative",
"Extremely Conservative" = "Ext Conservative"
)
# Create chart with age binning and value mapping using the numeric age
plot3 <- create_stackedbar(
data = gss_clean_age,
x_var = "age_numeric", # Use the numeric version
stack_var = "polviews_1a",
title = "Political Views by Age Group",
subtitle = "Distribution of political ideology across age cohorts",
x_label = "Age Group",
stack_label = "Political Views",
x_breaks = age_breaks,
x_bin_labels = age_labels,
stack_map_values = polviews_map,
stacked_type = "percent",
tooltip_suffix = "%",
x_tooltip_suffix = " years",
color_palette = c("#d7191c", "#fdae61", "#fee08b", "#e6f598", "#abdda4", "#66c2a5", "#2b83ba")
)
plot3Example 4: Including Missing Values
Let’s create a chart that explicitly shows missing data patterns.
## Example 4: Including Missing Values
# Let's create a chart that explicitly shows missing data patterns.
# Create chart including NA values (using default "(Missing)" labels)
plot4 <- create_stackedbar(
data = gss_clean,
x_var = "race_1a",
stack_var = "attend_1a",
title = "Religious Attendance by Race/Ethnicity",
subtitle = "Including non-responses as explicit categories",
x_label = "Race/Ethnicity",
stack_label = "Religious Attendance Frequency",
include_na = TRUE,
stacked_type = "percent",
tooltip_suffix = "%",
color_palette = c("#8e0152", "#c51b7d", "#de77ae", "#f1b6da", "#fde0ef",
"#e6f5d0", "#b8e186", "#7fbc41", "#4d9221", "#276419")
)
plot4Example 5: Custom Value Mapping
Let’s demonstrate comprehensive value mapping for cleaner labels.
## TODO: there is also an odd 4 chart
# Create mappings for cleaner display
sex_map <- list("Male" = "Men", "Female" = "Women")
class_map <- list(
"Lower Class" = "Lower",
"Working Class" = "Working",
"Middle Class" = "Middle",
"Upper Class" = "Upper"
)
# Create chart with custom mappings
plot5 <- create_stackedbar(
data = gss_clean,
x_var = "class_1a",
stack_var = "sex_1a",
title = "Gender Distribution Across Social Classes",
subtitle = "With custom labels and ordering",
x_label = "Self-Reported Social Class",
stack_label = "Gender",
x_map_values = class_map,
stack_map_values = sex_map,
x_order = c("Lower", "Working", "Middle", "Upper"),
stack_order = c("Women", "Men"),
stacked_type = "counts",
tooltip_prefix = "Count: ",
color_palette = c("#E07A5F", "#3D5A80")
)
plot5Complex Analysis Examples
Example 6: Regional Patterns in Trust
Let’s examine how trust levels vary across regions and social classes.
## TODO: there is also an odd Series 4 chart
# Create regional trust analysis
plot6 <- create_stackedbar(
data = gss_clean,
x_var = "region_1a",
stack_var = "trust_1a",
title = "Trust Levels by US Region",
subtitle = "Regional variation in interpersonal trust",
x_label = "US Region",
stack_label = "Trust Level",
stack_order = c("Can Trust", "Can't Be Too Careful", "Depends"),
stacked_type = "percent",
tooltip_suffix = "%",
color_palette = c("#2E8B57", "#DAA520", "#CD5C5C")
)
plot6Example 7: Multi-level Analysis with Income Binning
Let’s create income groups and examine their relationship with happiness and gender.
## TODO: there is also an odd 12 chart, + the order is off?
# First, let's examine the income variable
table(gss_clean$income_1a, useNA = "always")
#>
#> 1 2 3 4 5 6 7 8 9 10 11 12 <NA>
#> 39 40 21 15 19 15 22 47 160 129 198 1728 434
# Create income groups based on the GSS income categories
# Note: GSS income is typically coded as categories, not continuous
income_map <- list(
"1" = "Under $1,000",
"2" = "$1,000-2,999",
"3" = "$3,000-3,999",
"4" = "$4,000-4,999",
"5" = "$5,000-5,999",
"6" = "$6,000-6,999",
"7" = "$7,000-7,999",
"8" = "$8,000-9,999",
"9" = "$10,000-12,499",
"10" = "$12,500-14,999",
"11" = "$15,000-17,499",
"12" = "$17,500-19,999",
"13" = "$20,000-22,499",
"14" = "$22,500-24,999",
"15" = "$25,000-29,999",
"16" = "$30,000-34,999",
"17" = "$35,000-39,999",
"18" = "$40,000-49,999",
"19" = "$50,000-59,999",
"20" = "$60,000-74,999",
"21" = "$75,000-89,999",
"22" = "$90,000-109,999",
"23" = "$110,000-129,999",
"24" = "$130,000-149,999",
"25" = "$150,000+"
)
# Create simplified income groups
income_simple_map <- list(
"1" = "Low", "2" = "Low", "3" = "Low", "4" = "Low", "5" = "Low",
"6" = "Low", "7" = "Low", "8" = "Low", "9" = "Low-Mid",
"10" = "Low-Mid", "11" = "Low-Mid", "12" = "Low-Mid", "13" = "Low-Mid",
"14" = "Low-Mid", "15" = "Middle", "16" = "Middle", "17" = "Middle",
"18" = "Middle", "19" = "Mid-High", "20" = "Mid-High", "21" = "High",
"22" = "High", "23" = "High", "24" = "High", "25" = "High"
)
# Create income-happiness analysis
plot7 <- create_stackedbar(
data = gss_clean,
x_var = "income_1a",
stack_var = "happy_1a",
title = "Happiness Distribution by Income Level",
subtitle = "Simplified income categories",
x_label = "Income Level",
stack_label = "Happiness",
x_map_values = income_simple_map,
x_order = c("Low", "Low-Mid", "Middle", "Mid-High", "High"),
stack_order = c("Very Happy", "Pretty Happy", "Not Too Happy"),
stacked_type = "percent",
tooltip_suffix = "%",
color_palette = c("#1f77b4", "#ff7f0e", "#d62728"),
include_na = FALSE
)
plot7Summary and Best Practices
Key Features Demonstrated
- Basic stacked bars with both count and percentage displays
- Age binning for continuous variables
- Value mapping for cleaner, more descriptive labels
- Custom ordering for logical presentation of categories
- Missing value handling with explicit NA categories
- Pre-aggregated data support for existing summary tables
- Custom color palettes for different data types and branding
- Comprehensive tooltips with prefixes, suffixes, and formatting
- Flexible styling for different analytical needs
Best Practices for Stacked Bar Charts
# 1. Choose appropriate stacking type
# - Use "normal" for comparing absolute counts across groups
# - Use "percent" for comparing proportions within groups
# 2. Order categories logically
# - Use natural ordering for ordinal variables (e.g., Likert scales)
# - Consider frequency-based ordering for nominal categories
# - Place "Other" or "Missing" categories at the end
# 3. Handle missing data thoughtfully
# - Decide whether to include or exclude missing categories
# - Use include_na = TRUE when missing patterns are meaningful
# - Provide clear labels for missing categories
# 4. Use appropriate colors
# - Use diverging palettes for scales with meaningful center points
# - Use qualitative palettes for nominal categories
# - Ensure sufficient contrast between adjacent categories
# - Consider colorblind accessibility
# 5. Customize tooltips for clarity
# - Include units and context in tooltips
# - Use prefixes/suffixes to clarify meaning
# - Format numbers appropriately for your audience
# 6. Consider your audience
# - Use descriptive labels rather than codes
# - Provide clear titles and subtitles
# - Include sample sizes in subtitles when relevantCommon Use Cases
The create_stackedbar function is particularly useful
for:
- Survey response analysis: Displaying Likert scale responses across demographics
-
Demographic breakdowns: Showing composition of
groups by various characteristics
- Attitude research: Comparing opinions across different populations
- Market research: Analyzing customer segments and preferences
- Educational research: Examining outcomes across different groups
- Health surveys: Displaying health behaviors or outcomes by demographics
Conclusion
The create_stackedbar() function provides a
comprehensive solution for creating publication-ready stacked bar charts
from survey data. Its extensive customization options, automatic data
handling capabilities, and interactive features make it an invaluable
tool for social science researchers.
Key advantages include:
- Automatic data preparation for common survey data formats
- Flexible binning and mapping for continuous and coded variables
- Comprehensive missing data handling options
- Interactive tooltips for enhanced data exploration
- Publication-ready styling with extensive customization options
- Support for both raw and pre-aggregated data