Example of Visualizing Categorical Variables in R

R Markdown




## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':


## filter, lag

## The following objects are masked from 'package:base':


## intersect, setdiff, setequal, union




The Hair x Eye table comes from a survey of students at the University of Delaware reported by Snee (1974). The split by Sex was added by Friendly (1992a) for didactic purposes.

This data set is useful for illustrating various techniques for the analysis of contingency tables, such as the standard chi-squared test or, more generally, log-linear modelling, and graphical methods such as mosaic plots, sieve diagrams or association plots.


Exploring the data by using str() and summary() function


## 'table' num [1:4, 1:4, 1:2] 32 53 10 3 11 50 10 30 10 25 ...

## - attr(*, "dimnames")=List of 3

## ..$ Hair: chr [1:4] "Black" "Brown" "Red" "Blond"

## ..$ Eye : chr [1:4] "Brown" "Blue" "Hazel" "Green"

## ..$ Sex : chr [1:2] "Male" "Female"


## Number of cases in table: 592

## Number of factors: 3

## Test for independence of all factors:

## Chisq = 164.92, df = 24, p-value = 5.321e-23

## Chi-squared approximation may be incorrect


## , , Sex = Male


## Eye

## Hair Brown Blue Hazel Green

## Black 32 11 10 3

## Brown 53 50 25 15

## Red 10 10 7 7

## Blond 3 30 5 8


## , , Sex = Female


## Eye

## Hair Brown Blue Hazel Green

## Black 36 9 5 2

## Brown 66 34 29 14

## Red 16 7 7 7

## Blond 4 64 5 8



## 'data.frame': 32 obs. of 4 variables:

## $ Hair: Factor w/ 4 levels "Black","Brown",..: 1 2 3 4 1 2 3 4 1 2 ...

## $ Eye : Factor w/ 4 levels "Brown","Blue",..: 1 1 1 1 2 2 2 2 3 3 ...

## $ Sex : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...

## $ Freq: num 32 53 10 3 11 50 10 30 10 25 ...

Contungency Tables

With the categorical variables, we usually want to calculate the frequencies for each category. To show frequencies, contingency tables can be produced. For example we want to get the total count of female and male participants

To flatten data into gender/eye color we can make table contains both then calculate the probability table for them


prop.table(gendereyemix, 1)# % of men and women across eye color

## Eye

## Sex Brown Blue Hazel Green

## Male 0.35125448 0.36200717 0.16845878 0.11827957

## Female 0.38977636 0.36421725 0.14696486 0.09904153

# % of men and women for each specific eye color

prop.table(gendereyemix, 2)

## Eye

## Sex Brown Blue Hazel Green

## Male 0.4454545 0.4697674 0.5053763 0.5156250

## Female 0.5545455 0.5302326 0.4946237 0.4843750

# Number of men and women in the mix

margin.table(gendereyemix, 1)

## Sex

## Male Female

## 279 313

# Number of men and women per eye color

margin.table(gendereyemix, 2)

## Eye

## Brown Blue Hazel Green

## 220 215 93 64

qplot(data = data.df, Eye, Freq, geom="boxplot", color=Sex)

Most males and females have blue and brown eyes

qplot(data = data.df, Hair, Freq, geom="boxplot", color=Sex)

Most males and females have brown hair.

Let’s assume we are interested in the percentage of male and female with blue eyes

B_M<-data.df %>% select(Eye, Sex, Freq) %>%filter(Sex=="Male" & Eye=="Blue") %>% summarise(Male_Blue=sum(Freq))

B_F<-data.df %>% select(Eye, Sex, Freq) %>%filter(Sex=="Female" & Eye=="Blue") %>% summarise(Female_Blue=sum(Freq))

TOT<-data.df %>% summarise(TotH=sum(Freq))

male_blue <-B_M/TOT*100

female_blue<- B_F/TOT*100


## Male_Blue

## 1 17.06081


## Female_Blue

## 1 19.25676

Density plot of different hair colors

qplot(data=data.df, Eye, geom="density", fill=Eye, alpha=0.6)

