Saturday, September 8, 2012

R - Graphics for Statistics - figures with ggplot 2


Graphics out of the book Graphics for Statistics and Data Analysis with R by Kevin Keen (book home page)

dot chart of prevalence of allergy in endoscopic sinus surgery (figure 1.1)

  • first create the data frame (which is mandatory)

names<-factor(1:6,labels=c("Epidermals","Dust Mites","Weeds","Grasses","Molds","Trees"))
df <- data.frame(names=names,prevs=prevs)

names prevs
1 Epidermals  38.2
2 Dust Mites  37.8
3      Weeds  31.1
4    Grasses  31.1
5      Molds  29.3
6      Trees  26.7

  • now we can create the dot chart using geomsegment() (lines) and geompoint()
  • we map x to prevs and y to names for all layers
  • in geom_segment() we map additionally yend to names and set xend to zero and linetype to 3 (dotted)
  • in geom_point() we set shape to 19 (small filled circle)
  • than we set the limits of the x axis to c(0,50) accordingly to the book chart, set the title to Percent and get rid of the title of the y axis

ggplot(df,aes(x=prevs,y=names)) + 
   geom_segment(aes(yend=names),xend=0,linetype=3) + 
   geom_point(shape=19) +
   scale_x_continuous("Percent",limits=c(0,50)) +

bar chart of prevalence of allergy in endoscopic sinus surgery (figure 1.1)

  • now we map x to names and y to prevs
  • we use geombar(); we have to change the stat to "identity" because we use presummarised data (the default stat of the geom is "bin")
  • then we change the appearance of the axes as above

ggplot(df,aes(x=names,y=prevs)) + 
   geom_bar(stat="identity") +
   scale_y_continuous("Percent",limits=c(0,50)) +

  • this looks fine for now; but in the book graph the labels are rotated and the bins are looking a bit narrower
  • the width of the bins is changed through the width argument in geom_bar(); in this case it is a bit tricky, because using the identity stat resets width so we have to put width in to the aes() argument (further information)
  • rotating the labels is done via opts() and text_theme() (angle)
  • I also resize the labels (size)
  • and get rid of the axis ticks (axis.ticks=theme_blank())

ggplot(df,aes(x=names,y=prevs)) + 
       geom_bar(aes(width=0.7),stat="identity") +
       scale_y_continuous("Percent",limits=c(0,50)) +

  • unfortunately there are no ticks on the y axis as well, further more: in the current version of ggplot there is no equivalent to axis.ticks.x, so if you want to get rid of the ticks of just one axis you must use this hack (link)
  • another consequence is that ggsave does not work on the grid.remove edit - so we have to save the chart in the old fashioned way

png("fig1_2c.png",height=500, width=500)
ggplot(df,aes(x=names,y=prevs)) + 
       geom_bar(aes(width=0.5),stat="identity") +
       scale_y_continuous("Percent",limits=c(0,50)) +
g <- grid.gget(gPath("axis-b", "", "", "", "axis.ticks.segments"))


ggplot9.2 is out - and everything much easier:

  • you do not need to manipulate the grid elements directly, axis.ticks.x and axis.ticks.y are now available
  • axis.line does a good job to customize the axes
  • there are also some functions renamed: use theme instead of opts and element instead of theme

ggplot(df,aes(x=names,y=prevs)) + 
  geom_bar(aes(width=0.5),stat="identity") +
  scale_y_continuous("Percent",limits=c(0,50),expand=c(0,0)) +

