-
first we create a data frame with the following columns: id, date, value
-
to create the id column we use rep(), where the first argument 1:5 (the vector (1,2,3,4,5)) is repeated five times, so we have a vector with length 25
-
the vector dates is generated in the following way:
-
generate a vector of random numbers between 1 and 1500 and length 25 (sample.int())
-
use as.Date with the origin argument, this converts the integers into Date, where the the origin is day 0 (in the case 0 is convert to the origin i.e. 2000-01-01, 1 is mapped to 2000-01-02 etc pp, so we get a vector of dates between 2000 and beginning 2005
-
vals are generated by the random numbers function rnorm() (normal distributed with mean 5 and standard deviation 1
-
last but not least put the three together in one data frame (and let us show the first lines)
id <- rep(1:5,5)
dates <- as.Date(sample.int(1500, 25, replace=T), origin="2000-01-01")
vals <- rnorm(25, mean=5, sd=1)
df <- data.frame(id=id, date=dates, val=vals)
head(df)
id date val
1 1 2001-12-31 5.778680
2 2 2002-08-02 6.982799
3 3 2002-04-23 5.925903
4 4 2000-08-03 3.527375
5 5 2002-08-29 5.239211
6 1 2003-01-28 5.118337
-
id encodes a person, date contains the day of the measurement and val the values
-
now we want to add a column which contains the the first measurement in time
-
therefore we had to load plyr, then we use the function ddply() (the meaning of the first two letters is data frame in data frame out)
-
the first argument of the function is the data frame we pass to ddply
-
the second argument defines the groups (we want the min of each person so our grouping variable is id)
-
transform says we want to change existing data frame - like recode a variable or add a new (another choice would be summarise - if we want to aggregate the data) - so we add a variable named start and it should be the min() of our date per person
res <- ddply(df, .(id), transform, start=min(date))
res
id | date | val | start |
1 | 2001-12-31 | 5.77867959202519 | 2001-03-07 |
1 | 2003-01-28 | 5.11833655328897 | 2001-03-07 |
1 | 2003-11-14 | 3.75075463437505 | 2001-03-07 |
1 | 2001-03-07 | 3.36305093773468 | 2001-03-07 |
1 | 2001-03-07 | 6.6141583789233 | 2001-03-07 |
2 | 2002-08-02 | 6.98279884579167 | 2000-05-29 |
2 | 2003-12-30 | 5.4450019138646 | 2000-05-29 |
2 | 2000-05-29 | 5.92567982716667 | 2000-05-29 |
2 | 2000-11-26 | 4.60169956597544 | 2000-05-29 |
2 | 2002-08-04 | 5.02395812458 | 2000-05-29 |
3 | 2002-04-23 | 5.92590324593659 | 2000-02-02 |
3 | 2003-04-28 | 6.59246056959167 | 2000-02-02 |
3 | 2002-10-20 | 5.89780464300662 | 2000-02-02 |
3 | 2000-02-02 | 4.0548618816269 | 2000-02-02 |
3 | 2000-05-30 | 5.4722633402378 | 2000-02-02 |
4 | 2000-08-03 | 3.52737458048037 | 2000-08-03 |
4 | 2002-11-02 | 3.98853894162285 | 2000-08-03 |
4 | 2002-11-13 | 4.54373088519655 | 2000-08-03 |
4 | 2003-08-02 | 4.2144245184346 | 2000-08-03 |
4 | 2003-12-19 | 4.98767478298279 | 2000-08-03 |
5 | 2002-08-29 | 5.23921121209821 | 2000-04-27 |
5 | 2000-04-27 | 4.20400241411961 | 2000-04-27 |
5 | 2003-07-26 | 6.1821957272646 | 2000-04-27 |
5 | 2003-05-18 | 4.53515404638362 | 2000-04-27 |
5 | 2001-07-23 | 3.06695274940501 | 2000-04-27 |
-
if we change the function just a little we get the days elapsed from the first measurment:
res <- ddply(df, .(id), transform, dayselapsed=date-min(date))
res
id | date | val | dayselapsed |
1 | 2001-12-31 | 5.77867959202519 | 299 |
1 | 2003-01-28 | 5.11833655328897 | 692 |
1 | 2003-11-14 | 3.75075463437505 | 982 |
1 | 2001-03-07 | 3.36305093773468 | 0 |
1 | 2001-03-07 | 6.6141583789233 | 0 |
2 | 2002-08-02 | 6.98279884579167 | 795 |
2 | 2003-12-30 | 5.4450019138646 | 1310 |
2 | 2000-05-29 | 5.92567982716667 | 0 |
2 | 2000-11-26 | 4.60169956597544 | 181 |
2 | 2002-08-04 | 5.02395812458 | 797 |
3 | 2002-04-23 | 5.92590324593659 | 811 |
3 | 2003-04-28 | 6.59246056959167 | 1181 |
3 | 2002-10-20 | 5.89780464300662 | 991 |
3 | 2000-02-02 | 4.0548618816269 | 0 |
3 | 2000-05-30 | 5.4722633402378 | 118 |
4 | 2000-08-03 | 3.52737458048037 | 0 |
4 | 2002-11-02 | 3.98853894162285 | 821 |
4 | 2002-11-13 | 4.54373088519655 | 832 |
4 | 2003-08-02 | 4.2144245184346 | 1094 |
4 | 2003-12-19 | 4.98767478298279 | 1233 |
5 | 2002-08-29 | 5.23921121209821 | 854 |
5 | 2000-04-27 | 4.20400241411961 | 0 |
5 | 2003-07-26 | 6.1821957272646 | 1185 |
5 | 2003-05-18 | 4.53515404638362 | 1116 |
5 | 2001-07-23 | 3.06695274940501 | 452 |
No comments :
Post a Comment