- first we have to get the data which are birth and death rates of the year 1990, therefore we use the world bank data (and of course, there is a package
WDI
providing direct access)
- so load the package and download a list of indicators (
WDIcache
)
- the resulting data frame contains indicator and name, and additional information (data description, source)
require(WDI)
indicators <- as.data.frame(WDIcache()$series)
names(indicators)
[1] "indicator" "name" "description"
[4] "sourceDatabase" "sourceOrganization"
- then we have to find our two parameters of interest: the crude birth and the crude death rate:
- we use
grep
on the column name to find them
grep("crude",indicators$name)
[1] 6553 6554
- so we know that there are two lines containing the string crude in the name variable, so let's show them
indicators[grep("crude",indicators$name),]
indicator name
6553 SP.DYN.CBRT.IN Birth rate, crude (per 1,000 people)
6554 SP.DYN.CDRT.IN Death rate, crude (per 1,000 people)
description
6553 Crude birth rate indicates the number of live births occurring during the year, per 1,000 population estimated at midyear. Subtracting the crude death rate from the crude birth rate provides the rate of natural increase, which is equal to the rate of population change in the absence of migration.
6554 Crude death rate indicates the number of deaths occurring during the year, per 1,000 population estimated at midyear. Subtracting the crude death rate from the crude birth rate provides the rate of natural increase, which is equal to the rate of population change in the absence of migration.
sourceDatabase
6553 World Development Indicators
6554 World Development Indicators
sourceOrganization
6553 (1) United Nations Population Division. World Population Prospects, (2) United Nations Statistical Division. Population and Vital Statistics Reprot (various years), (3) Census reports and other statistical publications from national statistical offices, (4) Eurostat: Demographic Statistics, (5) Secretariat of the Pacific Community: Statistics and Demography Programme, and (6) U.S. Census Bureau: International Database.
6554 (1) United Nations Population Division. World Population Prospects, (2) United Nations Statistical Division. Population and Vital Statistics Reprot (various years), (3) Census reports and other statistical publications from national statistical offices, (4) Eurostat: Demographic Statistics, (5) Secretariat of the Pacific Community: Statistics and Demography Programme, and (6) U.S. Census Bureau: International Database.
- now we now the indicators - and we can download the desired data (and just in case: we download them for the years 1980–2012)
rate.data <- WDI(country="all",indicator=indicators[grep("crude",indicators$name),1],start=1980,end=2012)
rate.data.1990 <- rate.data[rate.data$year==1990,]
head(rate.data.1990)
iso2c country year SP.DYN.CBRT.IN
11 1A Arab World 1990 34.71824
44 1W World 1990 25.85034
77 4E East Asia & Pacific (developing only) 1990 22.84601
110 7E Europe & Central Asia (developing only) 1990 17.90782
143 8S South Asia 1990 32.91213
176 AD Andorra 1990 NA
SP.DYN.CDRT.IN
11 8.262026
44 9.266023
77 6.997506
110 10.065765
143 10.703840
176 NA
- now we have to remove aggregations from these data (such as the European Union, Middle East, etc)
- we use
grepl
on the iso2c
column (which does the same as grep
but returns logical values (of course you can also use grep
again))
- the
iso2c
is a two character, standardized country code
- we eliminate all rows with
iso2c
starting with X, equals EU, containing a number, or equals ZQ, ZJ, ZG, or ZF
rate.data.1990 <- rate.data.1990[!grepl("^X|EU|\\d|Z[QJGF]",rate.data.1990$iso2c,perl=T),]
head(rate.data.1990)
iso2c country year SP.DYN.CBRT.IN SP.DYN.CDRT.IN
176 AD Andorra 1990 NA NA
209 AE United Arab Emirates 1990 25.916 2.794
242 AF Afghanistan 1990 52.449 22.062
275 AG Antigua and Barbuda 1990 20.100 6.800
308 AL Albania 1990 24.610 5.909
341 AM Armenia 1990 21.215 7.738
- so that's the data frame
- now the graphics
require(ggplot2)
require(gridExtra)
require(scales)
rate.data.1990$lab <- ifelse(rbinom(size=1,n=nrow(rate.data.1990),prob=0.2)==1,rate.data.1990$country,"")
ggplot(rate.data.1990,aes(x=SP.DYN.CBRT.IN,y=SP.DYN.CDRT.IN)) +
stat_density2d(aes(colour= ..level..),bins=6,h=c(11,9),geom="density2d") +
scale_x_continuous("Birth Rate", limits = c(0,60),breaks=seq(0,60,by=10),expand=c(0,1)) +
scale_y_continuous("Death Rate", limits = c(0,30),breaks=seq(0,30,by=10),expand=c(0,1)) +
scale_colour_gradientn(colours=c("blue","green","lightgreen","red"),guide="none") +
geom_text(aes(label=lab),size=3,position = "jitter") +
geom_abline(intersect=0,slope=1) +
coord_fixed() +
annotate("text",x=20,y=20,angle=45, label="Zero Population Growth",vjust=-0.5,size=4) +
theme(
panel.background=element_blank(),
panel.border=element_rect(colour="black",fill="transparent"),
axis.text = element_text(colour="black"),
axis.ticks = element_line(colour="black"),
axis.line = element_line(colour="black")
)
- note:
- the plot in chapter 1.4 of the book uses an epanechnikov kernel, whereas ggplot uses a normal one
- I choose randomly about 20% of the country names as labels (because I was to lazy to pick up exactly those used in the book)
No comments :
Post a Comment