Skip to contents

Introduction

Welcome to this comprehensive guide on using the create_stackedbar function from the dashboardr package. This function is designed to create highly customizable interactive stacked bar charts from survey data, making it particularly valuable for communication science researchers and other social scientists working with categorical data.

Stacked bar charts are excellent for visualizing the distribution of categorical responses across different groups or demographics. They allow you to see both the overall patterns and the composition within each category, making them ideal for displaying survey responses, demographic breakdowns, and attitude distributions.

The create_stackedbar function handles many common data preparation tasks automatically, including:

  • Converting haven_labelled columns (from SPSS imports) to R factors
  • Mapping raw values to descriptive labels
  • Binning continuous variables into meaningful categories
  • Handling missing values explicitly or implicitly
  • Creating both count-based and percentage-based visualizations
  • Customizing colors, ordering, and interactive tooltips

This vignette demonstrates the function’s capabilities using the General Social Survey (GSS) Panel 2020 dataset, focusing on the 2016 wave (_1a variables).

Getting Started

First, let’s load the necessary libraries and examine our dataset.

library(gssr)
library(dplyr)
library(highcharter)
library(tidyr)
library(dashboardr)

# Load GSS Panel 2020 data
data(gss_panel20)

Data Preparation

Let’s prepare our working dataset using the 2020 wave variables.

# Create a working dataset with key _1a variables from 2020
gss_clean <- gss_panel20 %>%
  select(
    # Demographics
    age_1a, sex_1a, race_1a, degree_1a, region_1a,
    # Attitudes and behaviors
    happy_1a, trust_1a, fair_1a, helpful_1a,
    polviews_1a, partyid_1a, attend_1a,
    # Economic
    income_1a, class_1a
  ) %>%
  # Remove completely empty rows
  filter(if_any(everything(), ~ !is.na(.)))

# Check the data structure
glimpse(gss_clean)
#> Rows: 2,867
#> Columns: 14
#> $ age_1a      <dbl+lbl> 47, 61, 72, 43, 55, 53, 50, 23, 45, 71, 33, 86, 32, 60…
#> $ sex_1a      <dbl+lbl> 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 1, 2, 1, 2, 1, 2, 
#> $ race_1a     <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 3, 2, 1, 2, 1, 2, 2, 1, 1, 1, 3, 
#> $ degree_1a   <dbl+lbl> 3, 1, 3, 1, 4, 2, 1, 1, 1, 2, 1, 1, 1, 1, 0, 1, 1, 0, 
#> $ region_1a   <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 
#> $ happy_1a    <dbl+lbl>     2,     2,     1,     2,     1,     1,     2,     1…
#> $ trust_1a    <dbl+lbl> NA(i),     3,     1, NA(i),     1,     1, NA(i),     2…
#> $ fair_1a     <dbl+lbl> NA(i),     1,     1, NA(i),     2,     2, NA(i),     1…
#> $ helpful_1a  <dbl+lbl> NA(i),     2,     2, NA(i),     3,     1, NA(i),     1…
#> $ polviews_1a <dbl+lbl>     4,     2,     6,     4,     3,     3,     3,     5…
#> $ partyid_1a  <dbl+lbl>     3,     2,     5,     5,     1,     1,     5,     2…
#> $ attend_1a   <dbl+lbl> 0, 0, 7, 6, 0, 0, 1, 5, 6, 0, 5, 3, 5, 6, 1, 8, 8, 8, 
#> $ income_1a   <dbl+lbl> NA(n),    12,    12, NA(n), NA(n),    12, NA(n),    12…
#> $ class_1a    <dbl+lbl>     3, NA(d),     3,     3,     3,     3,     3,     2…

# Examine some key variables
table(gss_clean$degree_1a, useNA = "always")
#> 
#>    0    1    2    3    4 <NA> 
#>  328 1461  216  536  318    8
table(gss_clean$happy_1a, useNA = "always")
#> 
#>    1    2    3 <NA> 
#>  806 1601  452    8

Basic Stacked Bar Charts

Example 1: Education by Gender (Count-based)

Let’s start with a basic stacked bar chart showing educational attainment by gender.

## TODO: there is a odd 5 chart
# Create basic stacked bar chart
plot1 <- create_stackedbar(
  data = gss_clean,
  x_var = "degree_1a",
  stack_var = "sex_1a",
  title = "Educational Attainment by Gender",
  subtitle = "GSS Panel 2016 - Raw counts",
  x_label = "Highest Degree Completed",
  y_label = "Number of Respondents",
  stack_label = "Gender",
  stacked_type = "counts"
)

plot1

Example 2: Happiness Distribution (Percentage-based)

Now let’s create a percentage-based stacked bar chart to show happiness distribution across education levels.

## TODO: there is also an odd 5 chart

# Define education order for logical display
education_order <- c("Lt High School", "High School", "Junior College", "Bachelor", "Graduate")

# Create percentage stacked bar chart
plot2 <- create_stackedbar(
  data = gss_clean,
  x_var = "degree_1a",
  stack_var = "happy_1a",
  title = "Happiness Distribution Across Education Levels",
  subtitle = "Percentage breakdown within each education category",
  x_label = "Education Level",
  y_label = "Percentage of Respondents",
  stack_label = "Happiness Level",
  stacked_type = "percent",
  x_order = education_order,
  stack_order = c("Very Happy", "Pretty Happy", "Not Too Happy"),
  tooltip_suffix = "%",
  color_palette = c("#2E86AB", "#A23B72", "#F18F01")
)

plot2

Advanced Features

Example 3: Age Binning with Political Views

Let’s demonstrate binning continuous variables by creating age groups and examining political views.

# First, let's clean and prepare the age variable
gss_clean_age <- gss_clean %>%
  # Ensure age is numeric and remove missing values for this analysis
  filter(!is.na(age_1a), !is.na(polviews_1a)) %>%
  mutate(
    # Convert age to numeric if it isn't already
    age_numeric = as.numeric(age_1a)
  )

# Check the cleaned data
cat("Cleaned age summary:\n")
#> Cleaned age summary:
summary(gss_clean_age$age_numeric)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   18.00   34.00   50.00   49.26   62.00   89.00

# Define age breaks and labels (adjusted if needed based on actual data range)
age_range <- range(gss_clean_age$age_numeric, na.rm = TRUE)
cat("Age range in data:", age_range[1], "to", age_range[2], "\n")
#> Age range in data: 18 to 89
# Adjust breaks to match actual data range
age_breaks <- c(18, 30, 45, 60, 75, Inf)
age_labels <- c("18-29", "30-44", "45-59", "60-74", "75+")

# Map political views to shorter labels
polviews_map <- list(
  "Extremely Liberal" = "Ext Liberal",
  "Liberal" = "Liberal", 
  "Slightly Liberal" = "Sl Liberal",
  "Moderate" = "Moderate",
  "Slightly Conservative" = "Sl Conservative",
  "Conservative" = "Conservative",
  "Extremely Conservative" = "Ext Conservative"
)

# Create chart with age binning and value mapping using the numeric age
plot3 <- create_stackedbar(
  data = gss_clean_age,
  x_var = "age_numeric",  # Use the numeric version
  stack_var = "polviews_1a",
  title = "Political Views by Age Group",
  subtitle = "Distribution of political ideology across age cohorts",
  x_label = "Age Group",
  stack_label = "Political Views",
  x_breaks = age_breaks,
  x_bin_labels = age_labels,
  stack_map_values = polviews_map,
  stacked_type = "percent",
  tooltip_suffix = "%",
  x_tooltip_suffix = " years",
  color_palette = c("#d7191c", "#fdae61", "#fee08b", "#e6f598", "#abdda4", "#66c2a5", "#2b83ba")
)

plot3

Example 4: Including Missing Values

Let’s create a chart that explicitly shows missing data patterns.

## Example 4: Including Missing Values

# Let's create a chart that explicitly shows missing data patterns.

# Create chart including NA values (using default "(Missing)" labels)
plot4 <- create_stackedbar(
  data = gss_clean,
  x_var = "race_1a",
  stack_var = "attend_1a",
  title = "Religious Attendance by Race/Ethnicity",
  subtitle = "Including non-responses as explicit categories",
  x_label = "Race/Ethnicity",
  stack_label = "Religious Attendance Frequency",
  include_na = TRUE,
  stacked_type = "percent",
  tooltip_suffix = "%",
  color_palette = c("#8e0152", "#c51b7d", "#de77ae", "#f1b6da", "#fde0ef", 
                   "#e6f5d0", "#b8e186", "#7fbc41", "#4d9221", "#276419")
)

plot4

Example 5: Custom Value Mapping

Let’s demonstrate comprehensive value mapping for cleaner labels.


## TODO: there is also an odd 4 chart

# Create mappings for cleaner display
sex_map <- list("Male" = "Men", "Female" = "Women")
class_map <- list(
  "Lower Class" = "Lower",
  "Working Class" = "Working", 
  "Middle Class" = "Middle",
  "Upper Class" = "Upper"
)

# Create chart with custom mappings
plot5 <- create_stackedbar(
  data = gss_clean,
  x_var = "class_1a",
  stack_var = "sex_1a",
  title = "Gender Distribution Across Social Classes",
  subtitle = "With custom labels and ordering",
  x_label = "Self-Reported Social Class",
  stack_label = "Gender",
  x_map_values = class_map,
  stack_map_values = sex_map,
  x_order = c("Lower", "Working", "Middle", "Upper"),
  stack_order = c("Women", "Men"),
  stacked_type = "counts",
  tooltip_prefix = "Count: ",
  color_palette = c("#E07A5F", "#3D5A80")
)

plot5

Complex Analysis Examples

Example 6: Regional Patterns in Trust

Let’s examine how trust levels vary across regions and social classes.


## TODO: there is also an odd Series 4 chart

# Create regional trust analysis
plot6 <- create_stackedbar(
  data = gss_clean,
  x_var = "region_1a",
  stack_var = "trust_1a",
  title = "Trust Levels by US Region",
  subtitle = "Regional variation in interpersonal trust",
  x_label = "US Region",
  stack_label = "Trust Level",
  stack_order = c("Can Trust", "Can't Be Too Careful", "Depends"),
  stacked_type = "percent",
  tooltip_suffix = "%",
  color_palette = c("#2E8B57", "#DAA520", "#CD5C5C")
)

plot6

Example 7: Multi-level Analysis with Income Binning

Let’s create income groups and examine their relationship with happiness and gender.



## TODO: there is also an odd 12 chart, + the order is off?

# First, let's examine the income variable
table(gss_clean$income_1a, useNA = "always")
#> 
#>    1    2    3    4    5    6    7    8    9   10   11   12 <NA> 
#>   39   40   21   15   19   15   22   47  160  129  198 1728  434

# Create income groups based on the GSS income categories
# Note: GSS income is typically coded as categories, not continuous
income_map <- list(
  "1" = "Under $1,000",
  "2" = "$1,000-2,999", 
  "3" = "$3,000-3,999",
  "4" = "$4,000-4,999",
  "5" = "$5,000-5,999",
  "6" = "$6,000-6,999",
  "7" = "$7,000-7,999",
  "8" = "$8,000-9,999",
  "9" = "$10,000-12,499",
  "10" = "$12,500-14,999",
  "11" = "$15,000-17,499",
  "12" = "$17,500-19,999",
  "13" = "$20,000-22,499",
  "14" = "$22,500-24,999",
  "15" = "$25,000-29,999",
  "16" = "$30,000-34,999",
  "17" = "$35,000-39,999",
  "18" = "$40,000-49,999",
  "19" = "$50,000-59,999",
  "20" = "$60,000-74,999",
  "21" = "$75,000-89,999",
  "22" = "$90,000-109,999",
  "23" = "$110,000-129,999",
  "24" = "$130,000-149,999",
  "25" = "$150,000+"
)

# Create simplified income groups
income_simple_map <- list(
  "1" = "Low", "2" = "Low", "3" = "Low", "4" = "Low", "5" = "Low",
  "6" = "Low", "7" = "Low", "8" = "Low", "9" = "Low-Mid",
  "10" = "Low-Mid", "11" = "Low-Mid", "12" = "Low-Mid", "13" = "Low-Mid",
  "14" = "Low-Mid", "15" = "Middle", "16" = "Middle", "17" = "Middle",
  "18" = "Middle", "19" = "Mid-High", "20" = "Mid-High", "21" = "High",
  "22" = "High", "23" = "High", "24" = "High", "25" = "High"
)

# Create income-happiness analysis
plot7 <- create_stackedbar(
  data = gss_clean,
  x_var = "income_1a",
  stack_var = "happy_1a",
  title = "Happiness Distribution by Income Level",
  subtitle = "Simplified income categories",
  x_label = "Income Level",
  stack_label = "Happiness",
  x_map_values = income_simple_map,
  x_order = c("Low", "Low-Mid", "Middle", "Mid-High", "High"),
  stack_order = c("Very Happy", "Pretty Happy", "Not Too Happy"),
  stacked_type = "percent",
  tooltip_suffix = "%",
  color_palette = c("#1f77b4", "#ff7f0e", "#d62728"),
  include_na = FALSE
)

plot7

Summary and Best Practices

Key Features Demonstrated

  1. Basic stacked bars with both count and percentage displays
  2. Age binning for continuous variables
  3. Value mapping for cleaner, more descriptive labels
  4. Custom ordering for logical presentation of categories
  5. Missing value handling with explicit NA categories
  6. Pre-aggregated data support for existing summary tables
  7. Custom color palettes for different data types and branding
  8. Comprehensive tooltips with prefixes, suffixes, and formatting
  9. Flexible styling for different analytical needs

Best Practices for Stacked Bar Charts

# 1. Choose appropriate stacking type
# - Use "normal" for comparing absolute counts across groups
# - Use "percent" for comparing proportions within groups

# 2. Order categories logically
# - Use natural ordering for ordinal variables (e.g., Likert scales)
# - Consider frequency-based ordering for nominal categories
# - Place "Other" or "Missing" categories at the end

# 3. Handle missing data thoughtfully
# - Decide whether to include or exclude missing categories
# - Use include_na = TRUE when missing patterns are meaningful
# - Provide clear labels for missing categories

# 4. Use appropriate colors
# - Use diverging palettes for scales with meaningful center points
# - Use qualitative palettes for nominal categories
# - Ensure sufficient contrast between adjacent categories
# - Consider colorblind accessibility

# 5. Customize tooltips for clarity
# - Include units and context in tooltips
# - Use prefixes/suffixes to clarify meaning
# - Format numbers appropriately for your audience

# 6. Consider your audience
# - Use descriptive labels rather than codes
# - Provide clear titles and subtitles
# - Include sample sizes in subtitles when relevant

Common Use Cases

The create_stackedbar function is particularly useful for:

  • Survey response analysis: Displaying Likert scale responses across demographics
  • Demographic breakdowns: Showing composition of groups by various characteristics
  • Attitude research: Comparing opinions across different populations
  • Market research: Analyzing customer segments and preferences
  • Educational research: Examining outcomes across different groups
  • Health surveys: Displaying health behaviors or outcomes by demographics

Conclusion

The create_stackedbar() function provides a comprehensive solution for creating publication-ready stacked bar charts from survey data. Its extensive customization options, automatic data handling capabilities, and interactive features make it an invaluable tool for social science researchers.

Key advantages include:

  • Automatic data preparation for common survey data formats
  • Flexible binning and mapping for continuous and coded variables
  • Comprehensive missing data handling options
  • Interactive tooltips for enhanced data exploration
  • Publication-ready styling with extensive customization options
  • Support for both raw and pre-aggregated data