-
first we create a vector we can work with:
x <- sample(LETTERS[1:10], 20, replace=T)
x
[1] "J" "C" "J" "C" "F" "J" "E" "J" "H" "A" "C" "G" "I" "A" "F" "H" "J" "C" "C"
[20] "D"
-
unique() gives us a vector containing every new element of x but ignores repeated elements
unique(x)
[1] "J" "C" "F" "E" "H" "A" "G" "I" "D"
-
duplicated() gives a logical vector
duplicated(x)
[1] FALSE FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE
[13] FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
-
so if we want to get the same result like the one from unique we have to index x in the following way:
x[duplicated(x)==F]
[1] "J" "C" "F" "E" "H" "A" "G" "I" "D"
-
if we want to get just repeated occurences (i.e. a vector without the first occurence) we use the following line
x[duplicated(x)==T]
[1] "J" "C" "J" "J" "C" "A" "F" "H" "J" "C" "C"
-
we can use these commands the same way on dataframes, so let our x code persons, and we add a numeric value which could be a measurement and the order of the vector describes the order in which the measurements are taken
y <- rnorm(20, mean=10)
df <- data.frame(person=x, meas=y)
df
person meas
1 J 11.180452
2 C 10.235697
3 J 10.908622
4 C 10.677399
5 F 8.564007
6 J 10.070557
7 E 10.144191
8 J 10.872314
9 H 11.635032
10 A 10.448090
11 C 10.642052
12 G 8.689660
13 I 10.007930
14 A 8.321125
15 F 10.610739
16 H 9.060412
17 J 10.678726
18 C 8.513766
19 C 8.851564
20 D 12.793154
-
we extract the first measurement of each person with (the comma behind the F is important - it says we want the whole line)
df[duplicated(df$person)==F,]
person meas
1 J 11.180452
2 C 10.235697
5 F 8.564007
7 E 10.144191
9 H 11.635032
10 A 10.448090
12 G 8.689660
13 I 10.007930
20 D 12.793154
-
and with the following command we can extract the follow up measurements
df[duplicated(df$person)==T,]
person meas
3 J 10.908622
4 C 10.677399
6 J 10.070557
8 J 10.872314
11 C 10.642052
14 A 8.321125
15 F 10.610739
16 H 9.060412
17 J 10.678726
18 C 8.513766
19 C 8.851564
-
we also can use duplicate in a recursive way; the result of the following function is a list containing vectors whereupon the first contains the first occurence, the second the second, etc.; you can change it easily: so it can give back logical vectors which can use to index a dataframe, or for working on a dataframe itself (which both would be more useful)
sep.meas <- function(dupl){
res <- list()
while(length(dupl)>0){
res[[length(res)+1] ] <- dupl[duplicated(dupl)==F]
dupl <- dupl[duplicated(dupl)==T]
}
res
}
-
if we use it on x we get the following result
sep.meas(x)
[[1]]
[1] "J" "C" "F" "E" "H" "A" "G" "I" "D"
[[2]]
[1] "J" "C" "A" "F" "H"
[[3]]
[1] "J" "C"
[[4]]
[1] "J" "C"
[[5]]
[1] "J" "C"
No comments :
Post a Comment