Data Visualization 2#
This unit covers the layered framework for data visualization, i.e., the Grammar of Graphics.
The codes implement the ideas of layered graph building using the module,
plotnine
, which is essentially the Python version of R ggplot2.
import numpy as np
import pandas as pd
from plotnine import *
from plotnine.data import mpg, mtcars
%matplotlib inline
type(mpg)
pandas.core.frame.DataFrame
Data Frame Preparation#
pd.DataFrame.head(mpg)
manufacturer | model | displ | year | cyl | trans | drv | cty | hwy | fl | class | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | audi | a4 | 1.8 | 1999 | 4 | auto(l5) | f | 18 | 29 | p | compact |
1 | audi | a4 | 1.8 | 1999 | 4 | manual(m5) | f | 21 | 29 | p | compact |
2 | audi | a4 | 2.0 | 2008 | 4 | manual(m6) | f | 20 | 31 | p | compact |
3 | audi | a4 | 2.0 | 2008 | 4 | auto(av) | f | 21 | 30 | p | compact |
4 | audi | a4 | 2.8 | 1999 | 6 | auto(l5) | f | 16 | 26 | p | compact |
pd.DataFrame.head(mtcars)
name | mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
1 | Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
2 | Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
3 | Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
4 | Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
Basic Graphs#
(ggplot(mpg) +
aes(x = 'class', fill = 'class') +
geom_bar(size=20))
<ggplot: (8789024295544)>
(
ggplot(mpg) +
aes(x = 'class', y = 'hwy', fill = 'class') +
geom_boxplot() +
labs(x = 'Car Classes', y = 'Highway Milage')
)
<ggplot: (-9223363247830482144)>
(
ggplot(mpg) +
aes(x = 'cty', y = 'hwy', fill = 'class') +
geom_point(alpha = .7)
)
<ggplot: (8789024293647)>
(
ggplot(mpg) +
aes(x = 'cyl', y = 'hwy', fill = 'class') +
geom_boxplot()
)
<ggplot: (-9223363247830482172)>
More Complex Graphs#
Visualize four-dimensions (4-D)
(
ggplot(mtcars, aes('wt','mpg', color='factor(gear)')) +
geom_point() +
facet_wrap('~cyl')+
theme_bw()+
labs(x='Weight', y='Miles/gallon')+
scale_color_discrete(name='Forward Gear Number (Factor)')
)
<ggplot: (8789039975232)>
(
ggplot(mtcars, aes('wt','mpg',color='factor(gear)'))+
geom_point()+
geom_smooth(method="lm")+
theme_bw()
)
<ggplot: (8789024521094)>
Visualize 5-D
x
y
facet
color
size
(
ggplot(mtcars, aes('wt','mpg',color='factor(gear)', size='cyl'))+
geom_point()+
facet_wrap('~am')+
theme_bw()
)
<ggplot: (8789056543229)>
Visualize 6-D
x
y
facet (2 dimensions)
color
size
(
ggplot(mtcars, aes('wt', 'mpg', color='factor(gear)', size='cyl'))+
geom_point()+
facet_grid('am~carb')+
theme_bw()
)
<ggplot: (-9223363247814487292)>
More?#
Check Hans Rosling’s famous visualization of global population (a dynamic graphing)