Creating Interactive Stacked Bar Charts with `viz_stackedbar`
Alexandra Pafford
2025-08-03
stackedbar_vignette.Rmd📖 Introduction
Welcome to this comprehensive guide on using the
viz_stackedbar function from the dashboardr
package. This unified function creates highly customizable interactive
stacked bar charts from survey data, supporting two powerful modes:
Mode 1: Grouped/Crosstab Mode (use
x_var + stack_var)
- Shows how one variable breaks down by another (e.g., education by gender)
- Use when your data is in long/tidy format
- Example:
viz_stackedbar(data, x_var = "education", stack_var = "gender")
Mode 2: Multi-Variable/Battery Mode (use
x_vars)
- Compares response distributions across multiple survey questions
- Use when you have multiple columns with the same response scale (e.g., Likert items)
- Example:
viz_stackedbar(data, x_vars = c("q1", "q2", "q3"))
The function handles many common data preparation tasks automatically, including:
- Converting
haven_labelledcolumns (from SPSS imports) to R factors - Mapping raw values to descriptive labels
- Binning continuous variables into meaningful categories
- Handling missing values explicitly or implicitly
- Creating both count-based and percentage-based visualizations
- Customizing colors, ordering, and interactive tooltips
This vignette demonstrates both modes using the General Social Survey (GSS) Panel 2020 dataset.
📋 Data Preparation
Let’s prepare our working dataset using the 2020 wave variables.
# Create a working dataset with key _1a variables from 2020
gss_clean <- gss_panel20 %>%
select(
# Demographics
age_1a, sex_1a, race_1a, degree_1a, region_1a,
# Attitudes and behaviors
happy_1a, trust_1a, fair_1a, helpful_1a,
polviews_1a, partyid_1a, attend_1a,
# Economic
income_1a, class_1a
) %>%
# Remove completely empty rows
filter(if_any(everything(), ~ !is.na(.)))
# Check the data structure
glimpse(gss_clean)
#> Rows: 2,867
#> Columns: 14
#> $ age_1a <dbl+lbl> 47, 61, 72, 43, 55, 53, 50, 23, 45, 71, 33, 86, 32, 60…
#> $ sex_1a <dbl+lbl> 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, …
#> $ race_1a <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 3, 2, 1, 2, 1, 2, 2, 1, 1, 1, 3, …
#> $ degree_1a <dbl+lbl> 3, 1, 3, 1, 4, 2, 1, 1, 1, 2, 1, 1, 1, 1, 0, 1, 1, 0, …
#> $ region_1a <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, …
#> $ happy_1a <dbl+lbl> 2, 2, 1, 2, 1, 1, 2, 1…
#> $ trust_1a <dbl+lbl> NA(i), 3, 1, NA(i), 1, 1, NA(i), 2…
#> $ fair_1a <dbl+lbl> NA(i), 1, 1, NA(i), 2, 2, NA(i), 1…
#> $ helpful_1a <dbl+lbl> NA(i), 2, 2, NA(i), 3, 1, NA(i), 1…
#> $ polviews_1a <dbl+lbl> 4, 2, 6, 4, 3, 3, 3, 5…
#> $ partyid_1a <dbl+lbl> 3, 2, 5, 5, 1, 1, 5, 2…
#> $ attend_1a <dbl+lbl> 0, 0, 7, 6, 0, 0, 1, 5, 6, 0, 5, 3, 5, 6, 1, 8, 8, 8, …
#> $ income_1a <dbl+lbl> NA(n), 12, 12, NA(n), NA(n), 12, NA(n), 12…
#> $ class_1a <dbl+lbl> 3, NA(d), 3, 3, 3, 3, 3, 2…
# Examine some key variables
table(gss_clean$degree_1a, useNA = "always")
#>
#> 0 1 2 3 4 <NA>
#> 328 1461 216 536 318 8
table(gss_clean$happy_1a, useNA = "always")
#>
#> 1 2 3 <NA>
#> 806 1601 452 8📊 Basic Stacked Bar Charts
Example 1: Education by Gender (Count-based)
Let’s start with a basic stacked bar chart showing educational attainment by gender.
# Create basic stacked bar chart
plot1 <- viz_stackedbar(
data = gss_clean,
x_var = "degree_1a",
stack_var = "sex_1a",
title = "Educational Attainment by Gender",
subtitle = "GSS Panel 2016 - Raw counts",
x_label = "Highest Degree Completed",
y_label = "Number of Respondents",
stack_label = "Gender",
stacked_type = "counts"
)
plot1Example 2: Happiness Distribution (Percentage-based)
Now let’s create a percentage-based stacked bar chart to show happiness distribution across education levels.
# Define education order for logical display
education_order <- c("less than high school", "high school", "associate/junior college", "bachelor's", "graduate")
# Create percentage stacked bar chart
plot2 <- viz_stackedbar(
data = gss_clean,
x_var = "degree_1a",
stack_var = "happy_1a",
title = "Happiness Distribution Across Education Levels",
subtitle = "Percentage breakdown within each education category",
x_label = "Education Level",
y_label = "Percentage of Respondents",
stack_label = "Happiness Level",
stacked_type = "percent",
x_order = education_order,
stack_order = c("very happy", "pretty happy", "not too happy"),
tooltip_suffix = "%",
color_palette = c("#2E86AB", "#A23B72", "#F18F01")
)
plot2⚡ Advanced Features
Example 3: Age Binning with Political Views
Let’s demonstrate binning continuous variables by creating age groups and examining political views.
# First, let's clean and prepare the age variable
gss_clean_age <- gss_clean %>%
# Ensure age is numeric and remove missing values for this analysis
filter(!is.na(age_1a), !is.na(polviews_1a)) %>%
mutate(
# Convert age to numeric if it isn't already
age_numeric = as.numeric(age_1a)
)
# Check the cleaned data
cat("Cleaned age summary:\n")
#> Cleaned age summary:
summary(gss_clean_age$age_numeric)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 18.00 34.00 50.00 49.26 62.00 89.00
# Define age breaks and labels (adjusted if needed based on actual data range)
age_range <- range(gss_clean_age$age_numeric, na.rm = TRUE)
cat("Age range in data:", age_range[1], "to", age_range[2], "\n")
#> Age range in data: 18 to 89
# Adjust breaks to match actual data range
age_breaks <- c(18, 30, 45, 60, 75, Inf)
age_labels <- c("18-29", "30-44", "45-59", "60-74", "75+")
# Map political views to shorter labels
polviews_map <- list(
"extremely liberal" = "Ext. Liberal",
"liberal" = "Liberal",
"slightly liberal" = "Sl. Liberal",
"moderate, middle of the road" = "Moderate",
"slightly conservative" = "Sl. Conservative",
"conservative" = "Conservative",
"extremely conservative" = "Ext. Conservative"
)
polviews_order <- list("Ext. Liberal", "Liberal", "Sl. Liberal",
"Moderate", "Sl. Conservative", "Conservative",
"Ext. Conservative")
# Create chart with age binning and value mapping using the numeric age
plot3 <- viz_stackedbar(
data = gss_clean_age,
x_var = "age_numeric", # Use the numeric version
stack_var = "polviews_1a",
title = "Political Views by Age Group",
subtitle = "Distribution of political ideology across age cohorts",
x_label = "Age Group",
stack_label = "Political Views",
x_breaks = age_breaks,
x_bin_labels = age_labels,
stack_map_values = polviews_map,
stacked_type = "percent",
tooltip_suffix = "%",
x_tooltip_suffix = " years",
color_palette = c("#d7191c", "#fdae61", "#fee08b", "#e6f598", "#abdda4", "#66c2a5", "#2b83ba"),
stack_order = polviews_order
)
plot3Example 4: Including Missing Values
Let’s create a chart that explicitly shows missing data patterns.
## Example 4: Including Missing Values
# Let's create a chart that explicitly shows missing data patterns.
# Create chart including NA values (using default "(Missing)" labels)
plot4 <- viz_stackedbar(
data = gss_clean,
x_var = "race_1a",
stack_var = "attend_1a",
title = "Religious Attendance by Race/Ethnicity",
subtitle = "Including non-responses as explicit categories",
x_label = "Race/Ethnicity",
stack_label = "Religious Attendance Frequency",
include_na = TRUE,
stacked_type = "percent",
tooltip_suffix = "%",
color_palette = c("#8e0152", "#c51b7d", "#de77ae", "#f1b6da", "#fde0ef",
"#e6f5d0", "#b8e186", "#7fbc41", "#4d9221", "#276419")
)
plot4Example 5: Custom Value Mapping
Let’s demonstrate comprehensive value mapping for cleaner labels.
# Create mappings for cleaner display
sex_map <- list("male" = "Men", "female" = "Women")
class_map <- list(
"lower class" = "Lower",
"working class" = "Working",
"middle class" = "Middle",
"upper class" = "Upper"
)
# Create chart with custom mappings
plot5 <- viz_stackedbar(
data = gss_panel20,
x_var = "class_1a",
stack_var = "sex_1a",
title = "Gender Distribution Across Social Classes",
subtitle = "With custom labels and ordering",
x_label = "Self-Reported Social Class",
stack_label = "Gender",
x_map_values = class_map,
stack_map_values = sex_map,
x_order = c("Lower", "Working", "Middle", "Upper"),
stack_order = c("Women", "Men"),
stacked_type = "counts",
tooltip_prefix = "Count: ",
color_palette = c("#E07A5F", "#3D5A80")
)
plot5🔬 Complex Analysis Examples
Example 6: Regional Patterns in Trust
Let’s examine how trust levels vary across regions and social classes.
# Recode labels to fix the mistake
trust_map <- list(
"can't trust" = "Can Trust",
"can't be too careful" = "Can't Be Too Careful",
"depends" = "It Depends"
)
# Create regional trust analysis
plot6 <- viz_stackedbar(
data = gss_panel20,
x_var = "region_1a",
stack_var = "trust_1a",
stack_map_values = trust_map,
title = "Do You Trust Strangers?",
subtitle = "Regional variation in interpersonal trust",
x_label = "US Region",
stack_label = "Trust Level",
stack_order = c("Can Trust", "Can't Be Too Careful", "It Depends"),
stacked_type = "percent",
tooltip_suffix = "%",
color_palette = c("#2E8B57", "#CD5C5C", "#DAA520")
)
plot6📊 Multi-Variable Mode: Comparing Survey Questions
The viz_stackedbar function also supports comparing
multiple survey questions side-by-side. This is particularly useful for
visualizing survey batteries (sets of questions with the same response
scale).
Example 7: Basic Multi-Variable Comparison
When you have multiple columns representing different questions with
the same response categories, use x_vars to compare
them:
# Define the questions to compare
social_questions <- c("trust_1a", "fair_1a", "helpful_1a")
social_labels <- c(
"Interpersonal Trust",
"Fairness of Others",
"Helpfulness of Others"
)
# Create multi-variable comparison chart
plot7 <- viz_stackedbar(
data = gss_clean,
x_vars = social_questions,
x_var_labels = social_labels,
title = "Social Attitudes and Trust",
subtitle = "Distribution of responses across social attitude questions",
x_label = "Social Attitude Dimension",
stack_label = "Response Level",
stacked_type = "percent",
tooltip_suffix = "%"
)
plot7Example 8: Multi-Variable with Response Mapping
You can standardize response labels across questions and customize the display. It’s helpful to first check what the actual response values are:
# First, examine what the actual response values are
cat("Unique trust responses:\n")
#> Unique trust responses:
print(unique(as.character(gss_clean$trust_1a)))
#> [1] NA "3" "1" "2"
cat("\nUnique fair responses:\n")
#>
#> Unique fair responses:
print(unique(as.character(gss_clean$fair_1a)))
#> [1] NA "1" "2" "3"Now create a mapping to standardize the labels:
# Create response mapping for cleaner labels
response_map <- list(
"can't trust" = "High Trust/Positive",
"can't be too careful" = "Low Trust/Negative",
"depends" = "Situational/Neutral",
"would try to be fair" = "High Trust/Positive",
"would take advantage of you" = "Low Trust/Negative",
"try to be helpful" = "High Trust/Positive",
"looking out for themselves" = "Low Trust/Negative"
)
# Define response order (from negative to positive)
response_order <- c("Low Trust/Negative", "Situational/Neutral", "High Trust/Positive")
# Create chart with custom mapping and ordering
plot8 <- viz_stackedbar(
data = gss_clean,
x_vars = social_questions,
x_var_labels = social_labels,
title = "Social Trust Dimensions with Standardized Responses",
subtitle = "Responses mapped to consistent positive/negative categories",
x_label = "Trust Dimension",
stack_label = "Trust Level",
stack_map_values = response_map,
stack_order = response_order,
stacked_type = "percent",
tooltip_suffix = "%",
color_palette = c("#d62728", "#ffbb78", "#2ca02c"),
include_na = TRUE,
na_label_stack = "No Answer"
)
plot8Example 9: Single Variable with x_vars (Compact Display)
The x_vars parameter also works with a single variable.
This is useful when you want the compact styling of multi-variable mode
for a single question:
# Single variable with x_vars - great for compact horizontal displays
plot9a <- viz_stackedbar(
data = gss_clean,
x_vars = "happy_1a",
x_var_labels = "General Happiness",
title = "Happiness Distribution",
x_label = "Well-being Measure",
stack_label = "Happiness Level",
stacked_type = "percent",
horizontal = TRUE,
tooltip_suffix = "%",
color_palette = c("#2E8B57", "#FFD700", "#CD5C5C", "grey")
)
plot9aExample 9b: Horizontal Multi-Variable Chart
For better readability with long labels, use horizontal orientation:
# Horizontal chart for survey battery
plot9b <- viz_stackedbar(
data = gss_clean,
x_vars = c("trust_1a", "fair_1a", "helpful_1a"),
x_var_labels = c(
"Can people be trusted?",
"Are people generally fair?",
"Are people generally helpful?"
),
title = "Social Capital Dimensions",
subtitle = "GSS Panel 2016",
stacked_type = "percent",
horizontal = TRUE,
tooltip_suffix = "%",
color_palette = c("#8c510a", "#d8b365", "#f6e8c3", "grey"),
include_na = TRUE,
na_label_stack = "No response"
)
plot9bExample 10: Survey Battery Analysis
Survey batteries are sets of related questions with the same response scale. Here’s how to create a comprehensive battery analysis:
# Create a social trust battery
trust_battery <- c("trust_1a", "fair_1a", "helpful_1a")
trust_battery_labels <- c(
"Interpersonal Trust",
"Perceived Fairness",
"Perceived Helpfulness"
)
# Create a comprehensive battery analysis
plot10 <- viz_stackedbar(
data = gss_clean,
x_vars = trust_battery,
x_var_labels = trust_battery_labels,
title = "Social Trust Battery - Complete Analysis",
subtitle = "Comprehensive view of social trust dimensions with enhanced tooltips",
x_label = "Trust Dimension",
stack_label = "Response Category",
stacked_type = "percent",
tooltip_prefix = "Percentage: ",
tooltip_suffix = "% of respondents",
show_var_tooltip = TRUE,
include_na = TRUE,
na_label_stack = "No answer",
color_palette = c("#8c510a", "#d8b365", "#f6e8c3", "darkgrey")
)
plot10Example 11: Publication-Ready Chart
Let’s create a fully customized, publication-ready chart:
# Create the most polished example
plot11 <- viz_stackedbar(
data = gss_clean,
x_vars = social_questions,
x_var_labels = c(
"Interpersonal Trust\n('Can most people be trusted?')",
"Perceived Fairness\n('Do people try to be fair?')",
"Perceived Helpfulness\n('Are people helpful?')"
),
title = "Social Capital Dimensions in American Society",
subtitle = "General Social Survey Panel 2016 (N = 2,867 respondents)\nPercentage distribution of responses across social trust measures",
x_label = "Social Trust Dimension",
stack_label = "Response Category",
stacked_type = "percent",
tooltip_prefix = "",
tooltip_suffix = "% of respondents",
x_tooltip_suffix = "",
include_na = TRUE,
na_label_stack = "No response",
color_palette = c("#b2182b", "#ef8a62", "#fddbc7", "darkgrey"),
show_var_tooltip = TRUE
)
plot11🏷️ Labels and Tooltips Reference
Summary of Label and Tooltip Options
The viz_stackedbar() function offers extensive
customization for labels and tooltips:
| Parameter | Description | Example |
|---|---|---|
x_label |
X-axis title | "Question" |
y_label |
Y-axis title (auto-set based on stacked_type) | "Percentage" |
stack_label |
Legend title | "Response Category" |
x_var_labels |
Custom labels for each question (multi-variable mode) | c("Trust", "Fairness") |
tooltip_prefix |
Text before value in tooltip | "Score: " |
tooltip_suffix |
Text after value in tooltip |
"%", " respondents"
|
x_tooltip_suffix |
Text after category name in tooltip | " question" |
show_var_tooltip |
Show question name in tooltip (multi-variable mode) | TRUE |
# Example with all label/tooltip options
viz_stackedbar(
data = gss_clean,
x_vars = c("trust_1a", "fair_1a"),
x_var_labels = c("Trust", "Fairness"),
title = "Social Attitudes",
x_label = "Attitude Measure",
y_label = "Percent of Respondents",
stack_label = "Response Level",
stacked_type = "percent",
tooltip_prefix = "",
tooltip_suffix = "% responded",
show_var_tooltip = TRUE
)When to Use Each Mode
| Mode | Use Case | Parameters |
|---|---|---|
| Grouped/Crosstab | One variable broken down by another |
x_var + stack_var
|
| Multi-Variable | Compare multiple questions side-by-side | x_vars |
Use Grouped Mode when: - You want to show how education levels differ by gender - You’re creating a cross-tabulation visualization - Your data is already in long/tidy format
Use Multi-Variable Mode when: - You’re comparing multiple survey questions - Your questions share the same response categories - You want to visualize a Likert scale battery
💡 Summary and Best Practices
✅ Key Features Demonstrated
-
Two flexible modes: Grouped/crosstab
(
x_var+stack_var) and multi-variable (x_vars) - Basic stacked bars with both count and percentage displays
- Age binning for continuous variables
- Value mapping for cleaner, more descriptive labels
- Custom ordering for logical presentation of categories
- Missing value handling with explicit NA categories
- Multi-variable comparisons for survey batteries and Likert scales
- Custom color palettes for different data types and branding
- Comprehensive tooltips with prefixes, suffixes, and formatting
- Horizontal orientation for better readability with long labels
🎯 Best Practices for Stacked Bar Charts
General Guidelines
- Choose appropriate stacking type
- Use “normal” or “counts” for comparing absolute counts across groups
- Use “percent” for comparing proportions within groups
- Order categories logically
- When remapping values, remember to use the variable names as in the DataFrame
- Use natural ordering for ordinal variables (e.g., Likert scales)
- Consider frequency-based ordering for nominal categories
- Place “Other” or “Missing” categories at the end
Multi-Variable Mode Best Practices
- Choose questions with similar response scales
- Use questions that have the same or compatible response categories
- Consider mapping different scales to common categories when appropriate
- Order questions logically
- Group related concepts together
- Consider ordering by typical response patterns (most positive to least positive)
- Place most important questions first
- Use appropriate stacking type
- Use “percent” for comparing response patterns across questions
- Use “counts” when absolute counts matter more than proportions
- Handle missing data thoughtfully
- Decide whether to include or exclude missing categories
- Use include_na = TRUE when missing patterns are meaningful
- Provide clear labels for missing categories
- Use appropriate colors
- Use diverging palettes for scales with meaningful center points
- Use qualitative palettes for nominal categories
- Ensure sufficient contrast between adjacent categories
- Consider colorblind accessibility
- Customize tooltips for clarity
- Include units and context in tooltips
- Use prefixes/suffixes to clarify meaning
- Format numbers appropriately for your audience
- Consider your audience
- Use descriptive labels rather than codes
- Provide clear titles and subtitles
- Include sample sizes in subtitles when relevant
🌍 Common Use Cases
The viz_stackedbar function is particularly useful
for:
- Survey response analysis: Displaying Likert scale responses across demographics
-
Demographic breakdowns: Showing composition of
groups by various characteristics
- Attitude research: Comparing opinions across different populations
- Market research: Analyzing customer segments and preferences
- Educational research: Examining outcomes across different groups
- Health surveys: Displaying health behaviors or outcomes by demographics
📚 Conclusion
The viz_stackedbar() function provides a unified,
comprehensive solution for creating publication-ready stacked bar charts
from survey data. Its two flexible modes handle the most common
visualization needs:
-
Grouped/Crosstab Mode (
x_var+stack_var): Show how one variable breaks down by another -
Multi-Variable Mode (
x_vars): Compare response distributions across multiple survey questions
Key advantages include:
- Unified interface - one function for both crosstabs and survey batteries
- Automatic data preparation for common survey data formats
- Smart mode detection based on the parameters you provide
- Flexible binning and mapping for continuous and coded variables
- Comprehensive missing data handling options
- Interactive tooltips for enhanced data exploration
- Publication-ready styling with extensive customization options
Note: If you were previously using
viz_stackedbars() for multi-variable comparisons, you can
now use viz_stackedbar() with the same parameters. The old
function still works but viz_stackedbar() is now the
recommended approach for all stacked bar chart needs.