Example of Visualizing Categorical Variables in R

Eda Coşkun
Sep 23, 2020
2 min read

R Markdown

library(ggplot2)

library(dplyr)

## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':

## filter, lag

## The following objects are masked from 'package:base':

## intersect, setdiff, setequal, union

library(tidyr)

library(scales)

data("HairEyeColor")

The Hair x Eye table comes from a survey of students at the University of Delaware reported by Snee (1974). The split by Sex was added by Friendly (1992a) for didactic purposes.

This data set is useful for illustrating various techniques for the analysis of contingency tables, such as the standard chi-squared test or, more generally, log-linear modelling, and graphical methods such as mosaic plots, sieve diagrams or association plots.

Source http://euclid.psych.yorku.ca/ftp/sas/vcd/catdata/haireye.sas

Exploring the data by using str() and summary() function

str(HairEyeColor)

## 'table' num [1:4, 1:4, 1:2] 32 53 10 3 11 50 10 30 10 25 ...

## - attr(*, "dimnames")=List of 3

## ..$ Hair: chr [1:4] "Black" "Brown" "Red" "Blond"

## ..$ Eye : chr [1:4] "Brown" "Blue" "Hazel" "Green"

## ..$ Sex : chr [1:2] "Male" "Female"

summary(HairEyeColor)

## Number of cases in table: 592

## Number of factors: 3

## Test for independence of all factors:

## Chisq = 164.92, df = 24, p-value = 5.321e-23

## Chi-squared approximation may be incorrect

head(HairEyeColor)

## , , Sex = Male

## Eye

## Hair Brown Blue Hazel Green

## Black 32 11 10 3

## Brown 53 50 25 15

## Red 10 10 7 7

## Blond 3 30 5 8

## , , Sex = Female

## Eye

## Hair Brown Blue Hazel Green

## Black 36 9 5 2

## Brown 66 34 29 14

## Red 16 7 7 7

## Blond 4 64 5 8

data.df<- as.data.frame(HairEyeColor)

str(data.df)

## 'data.frame': 32 obs. of 4 variables:

## $ Hair: Factor w/ 4 levels "Black","Brown",..: 1 2 3 4 1 2 3 4 1 2 ...

## $ Eye : Factor w/ 4 levels "Brown","Blue",..: 1 1 1 1 2 2 2 2 3 3 ...

## $ Sex : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...

## $ Freq: num 32 53 10 3 11 50 10 30 10 25 ...

Contungency Tables

With the categorical variables, we usually want to calculate the frequencies for each category. To show frequencies, contingency tables can be produced. For example we want to get the total count of female and male participants

To flatten data into gender/eye color we can make table contains both then calculate the probability table for them

gendereyemix<-xtabs(Freq~Sex+Eye,data.frame(HairEyeColor))

prop.table(gendereyemix, 1)# % of men and women across eye color

## Eye

## Sex Brown Blue Hazel Green

## Male 0.35125448 0.36200717 0.16845878 0.11827957

## Female 0.38977636 0.36421725 0.14696486 0.09904153

# % of men and women for each specific eye color

prop.table(gendereyemix, 2)

## Eye

## Sex Brown Blue Hazel Green

## Male 0.4454545 0.4697674 0.5053763 0.5156250

## Female 0.5545455 0.5302326 0.4946237 0.4843750

# Number of men and women in the mix

margin.table(gendereyemix, 1)

## Sex

## Male Female

## 279 313

# Number of men and women per eye color

margin.table(gendereyemix, 2)

## Eye

## Brown Blue Hazel Green

## 220 215 93 64

qplot(data = data.df, Eye, Freq, geom="boxplot", color=Sex)

Most males and females have blue and brown eyes

qplot(data = data.df, Hair, Freq, geom="boxplot", color=Sex)

Most males and females have brown hair.

Let’s assume we are interested in the percentage of male and female with blue eyes

B_M<-data.df %>% select(Eye, Sex, Freq) %>%filter(Sex=="Male" & Eye=="Blue") %>% summarise(Male_Blue=sum(Freq))

B_F<-data.df %>% select(Eye, Sex, Freq) %>%filter(Sex=="Female" & Eye=="Blue") %>% summarise(Female_Blue=sum(Freq))

TOT<-data.df %>% summarise(TotH=sum(Freq))

male_blue <-B_M/TOT*100

female_blue<- B_F/TOT*100

male_blue

## Male_Blue

## 1 17.06081

female_blue

## Female_Blue

## 1 19.25676

Density plot of different hair colors

qplot(data=data.df, Eye, geom="density", fill=Eye, alpha=0.6)

You can find the html file in RPubs

https://rpubs.com/edacoskun/664927

EDA COSKUN

Pursuing Industrial Engineering

Example of Visualizing Categorical Variables in R

Recent Posts

Commentaires

Never Miss a Post. Subscribe Now!