top of page
Writer's pictureEda Coşkun

Example of Visualizing Categorical Variables in R


R Markdown

library(ggplot2)

library(dplyr)

##

## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':

##

## filter, lag

## The following objects are masked from 'package:base':

##

## intersect, setdiff, setequal, union

library(tidyr)

library(scales)

data("HairEyeColor")

The Hair x Eye table comes from a survey of students at the University of Delaware reported by Snee (1974). The split by Sex was added by Friendly (1992a) for didactic purposes.


This data set is useful for illustrating various techniques for the analysis of contingency tables, such as the standard chi-squared test or, more generally, log-linear modelling, and graphical methods such as mosaic plots, sieve diagrams or association plots.


Source http://euclid.psych.yorku.ca/ftp/sas/vcd/catdata/haireye.sas


Exploring the data by using str() and summary() function


str(HairEyeColor)

## 'table' num [1:4, 1:4, 1:2] 32 53 10 3 11 50 10 30 10 25 ...

## - attr(*, "dimnames")=List of 3

## ..$ Hair: chr [1:4] "Black" "Brown" "Red" "Blond"

## ..$ Eye : chr [1:4] "Brown" "Blue" "Hazel" "Green"

## ..$ Sex : chr [1:2] "Male" "Female"

summary(HairEyeColor)

## Number of cases in table: 592

## Number of factors: 3

## Test for independence of all factors:

## Chisq = 164.92, df = 24, p-value = 5.321e-23

## Chi-squared approximation may be incorrect

head(HairEyeColor)

## , , Sex = Male

##

## Eye

## Hair Brown Blue Hazel Green

## Black 32 11 10 3

## Brown 53 50 25 15

## Red 10 10 7 7

## Blond 3 30 5 8

##

## , , Sex = Female

##

## Eye

## Hair Brown Blue Hazel Green

## Black 36 9 5 2

## Brown 66 34 29 14

## Red 16 7 7 7

## Blond 4 64 5 8

data.df<- as.data.frame(HairEyeColor)

str(data.df)

## 'data.frame': 32 obs. of 4 variables:

## $ Hair: Factor w/ 4 levels "Black","Brown",..: 1 2 3 4 1 2 3 4 1 2 ...

## $ Eye : Factor w/ 4 levels "Brown","Blue",..: 1 1 1 1 2 2 2 2 3 3 ...

## $ Sex : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...

## $ Freq: num 32 53 10 3 11 50 10 30 10 25 ...

Contungency Tables


With the categorical variables, we usually want to calculate the frequencies for each category. To show frequencies, contingency tables can be produced. For example we want to get the total count of female and male participants


To flatten data into gender/eye color we can make table contains both then calculate the probability table for them


gendereyemix<-xtabs(Freq~Sex+Eye,data.frame(HairEyeColor))

prop.table(gendereyemix, 1)# % of men and women across eye color

## Eye

## Sex Brown Blue Hazel Green

## Male 0.35125448 0.36200717 0.16845878 0.11827957

## Female 0.38977636 0.36421725 0.14696486 0.09904153

# % of men and women for each specific eye color


prop.table(gendereyemix, 2)

## Eye

## Sex Brown Blue Hazel Green

## Male 0.4454545 0.4697674 0.5053763 0.5156250

## Female 0.5545455 0.5302326 0.4946237 0.4843750

# Number of men and women in the mix


margin.table(gendereyemix, 1)

## Sex

## Male Female

## 279 313

# Number of men and women per eye color


margin.table(gendereyemix, 2)

## Eye

## Brown Blue Hazel Green

## 220 215 93 64



qplot(data = data.df, Eye, Freq, geom="boxplot", color=Sex)


Most males and females have blue and brown eyes



qplot(data = data.df, Hair, Freq, geom="boxplot", color=Sex)




Most males and females have brown hair.



Let’s assume we are interested in the percentage of male and female with blue eyes


B_M<-data.df %>% select(Eye, Sex, Freq) %>%filter(Sex=="Male" & Eye=="Blue") %>% summarise(Male_Blue=sum(Freq))


B_F<-data.df %>% select(Eye, Sex, Freq) %>%filter(Sex=="Female" & Eye=="Blue") %>% summarise(Female_Blue=sum(Freq))


TOT<-data.df %>% summarise(TotH=sum(Freq))


male_blue <-B_M/TOT*100


female_blue<- B_F/TOT*100


male_blue

## Male_Blue

## 1 17.06081

female_blue

## Female_Blue

## 1 19.25676

Density plot of different hair colors


qplot(data=data.df, Eye, geom="density", fill=Eye, alpha=0.6)


You can find the html file in RPubs

35 views0 comments

Recent Posts

See All

Comments


bottom of page