Skip to contents

πŸ“– Introduction

The viz_scatter() function visualizes the relationship between two numeric variables. Each point represents one observation, positioned by its x and y values. Essential for exploring correlations, clusters, and outliers.

library(dashboardr)
library(dplyr)
library(gssr)
library(haven)

# Load GSS data - we need numeric variables for scatter plots
data(gss_all)
gss <- gss_all %>%
  select(year, age, sex, race, degree, happy, polviews, educ, realinc) %>%
  filter(year == 2022,  # Use 2022 which has realinc data
         !is.na(age), !is.na(educ), !is.na(realinc),
         realinc > 0, educ > 0) %>%
  mutate(
    sex = droplevels(as_factor(sex)),
    degree = droplevels(as_factor(degree))
  )

πŸ“Š Basic Scatter Plots

Create a simple scatter plot showing the relationship between education (years) and income:

plot <- viz_scatter(
  data = gss,
  x_var = "educ",
  y_var = "realinc",
  title = "Education vs Income",
  x_label = "Years of Education",
  y_label = "Real Income ($)"
)

plot

πŸ“ˆ Adding Trend Lines

Use show_trend = TRUE to add a regression line:

plot <- viz_scatter(
  data = gss,
  x_var = "educ",
  y_var = "realinc",
  show_trend = TRUE,
  title = "Education vs Income (with trend)"
)

plot

🎨 Coloring by Groups

Use color_var to color points by a categorical variable:

plot <- viz_scatter(
  data = gss,
  x_var = "educ",
  y_var = "realinc",
  color_var = "sex",
  title = "Education vs Income by Sex",
  color_palette = c("#3498DB", "#E74C3C")
)

plot

πŸ“Š Age vs Education

Another relationship to explore - age and years of education:

plot <- viz_scatter(
  data = gss,
  x_var = "age",
  y_var = "educ",
  color_var = "degree",
  title = "Age vs Education by Degree",
  x_label = "Age (years)",
  y_label = "Years of Education",
  alpha = 0.5,
  color_palette = c("#E74C3C", "#F39C12", "#27AE60", "#3498DB", "#9B59B6")
)

plot

πŸ‘οΈ Handling Overlap with Transparency

For dense data, use alpha to reveal patterns:

plot <- viz_scatter(
  data = gss,
  x_var = "age",
  y_var = "realinc",
  alpha = 0.3,
  point_size = 3,
  title = "Age vs Income (with transparency)"
)

plot

🏷️ Labels and Tooltips

Customize axis labels and tooltip formatting for better readability:

plot <- viz_scatter(
  data = gss,
  x_var = "educ",
  y_var = "realinc",
  title = "Education vs Income",
  x_label = "Years of Education",
  y_label = "Annual Income (USD)",
  tooltip_format = "Education: {x} years, Income: ${y}"
)

plot
Parameter Description Example
x_label Custom x-axis label "Years of Education"
y_label Custom y-axis label "Income (USD)"
tooltip_format Custom tooltip template "x: {x}, y: {y}"

The tooltip_format parameter supports placeholders: {x} for x-value, {y} for y-value, and {color} for the color group.

πŸ“ Using with create_content()

Integrate scatter plots into dashboards:

content <- create_content(data = gss, type = "scatter") %>%
  add_viz(
    x_var = "educ",
    y_var = "realinc",
    show_trend = TRUE,
    title = "Education vs Income"
  )

content %>% preview()
Preview
Education vs Income

With Filters

Compare relationships across groups:

content <- create_content(data = gss, type = "scatter", alpha = 0.5) %>%
  add_viz(
    x_var = "educ",
    y_var = "realinc",
    title = "Male",
    filter = ~ sex == "male",
    tabgroup = "By Sex"
  ) %>%
  add_viz(
    x_var = "educ",
    y_var = "realinc",
    title = "Female",
    filter = ~ sex == "female",
    tabgroup = "By Sex"
  )

content %>% preview()
Preview
Male
Female

Multiple Relationships

content <- create_content(data = gss, type = "scatter", alpha = 0.4, show_trend = TRUE) %>%
  add_viz(
    x_var = "educ",
    y_var = "realinc",
    title = "Education β†’ Income",
    tabgroup = "Relationships"
  ) %>%
  add_viz(
    x_var = "age",
    y_var = "realinc",
    title = "Age β†’ Income",
    tabgroup = "Relationships"
  )

content %>% preview()
Preview
Education β†’ Income
Age β†’ Income

πŸ” Interpreting Scatter Plots

Correlation Patterns

Pattern Meaning
Upward slope Positive relationship
Downward slope Negative relationship
No pattern No linear relationship
Tight cluster Strong relationship
Wide scatter Weak relationship

What to Look For

  1. Direction - Positive, negative, or none?
  2. Strength - How tightly clustered?
  3. Linearity - Linear or curved?
  4. Outliers - Points far from the pattern?
  5. Clusters - Distinct groups?

πŸ’‘ When to Use Scatter Plots

Use viz_scatter() when: - Exploring relationship between two numeric variables - Looking for correlations - Identifying outliers - Showing individual-level data

Use viz_histogram() when: - Showing distribution of a single variable

Use viz_heatmap() when: - Data is aggregated (means, counts) - Many overlapping points

πŸ“š See Also