根据两个因子级别删除行 [英] Remove row based on two factor levels
问题描述
我遇到了一个与此问题非常相似的问题,但是我的数据按两个级别分组。
I had a problem that is very similar to this question, however my data is grouped by two levels.
str(dt)
'data.frame': 202206 obs. of 4 variables:
$ cros : int -205 -200 -195 -190 -185 -180 -175 -170 -165 -160 ...
$ along: Factor w/ 113 levels "100","101","102",..: 1 1 1 1 1 1 1 1 1 1 ...
$ alti : num 1.61 1.6 1.6 1.6 1.6 1.59 1.59 1.59 1.59 1.58 ...
$ year : Factor w/ 6 levels "1979","1983",..: 1 1 1 1 1 1 1 1 1 1 ...
head(dt)
cros along alti year
-205 100 1.61 1979
-200 100 1.60 1979
-195 100 1.60 1979
-190 100 1.60 1979
-185 100 1.60 1979
-180 100 1.59 1979
这些数据是来自不同横断面的信息,它们每隔5米测量一次,这是可变交叉高度,它是变量alti。他们已经做了多年,但是有时候,在一个特定的年份,断面更长。所以我想删除的交叉点,不是测量所有年份的行。
This data is information from different transects which is the variable along, over that transect they measured at every 5 meter which is the variable cros the altitude which is the variable alti. This they have done over multiple years, however sometimes the transect was longer at a particular year. So I want to remove the rows with a cros points that were not measured all years.
对于我的数据集,我有一个因子(沿)113级,
年
有6个级别。在这些值中,我有一年要做的分析的x(沿
)和y( alti
)然而对于多年来,x必须是相同的值。我想要因子 cros
删除年
中不会出现的每个因子沿
。
For my data set I have one factor (along
) with 113 levels and within that factor I have the factor year
with 6 levels. Within these to values I have x (along
) and y (alti
) which I want to do analysis over the year however for the years the x has to be the same values. I want for the factor cros
to remove the values that do not occur at all the years
for each factor of along
.
我使用的代码是:
require(data.table)
dt <- as.data.table(total)
tt <- dt[,length(unique(along,year)),by=cros]
tt <- tt[V1==max(V1)]
test <-dt[cros %in% tt$cros]
但我没有得到正确的结果。我可以图像独特(沿,年)不是正确的方式来处理分组数据。但我不知道如何做到正确。
But I do not get the right result. I can image that unique(along,year) is not the right way to work with grouped data. However I do not know how to do it right.
这里更清楚一点。
> df <- data.frame(along = c(10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,12,12,12,12,12,12,12,12,12,12,12,12,12), year = c(20,20,20,25,25,25,21,21,20,20,25,25,25,21,21,21,20,20,20,20,25,25,25,25,25,21,21,21,21), cros = c(11,12,13,11,12,13,11,12,11,12,11,12,13,11,12,13,14,15,16,17,14,15,16,17,18,12,13,14,15), value = ceiling(rnorm(29)*10))
> df
along year cros value
10 20 11 -3
10 20 12 5
10 20 13 -22
10 25 11 -9
10 25 12 -3
10 25 13 -8
10 21 11 -8
10 21 12 -8
11 20 11 7
11 20 12 -4
11 25 11 -6
11 25 12 9
11 25 13 -5
11 21 11 6
11 21 12 17
11 21 13 -5
12 20 14 -16
12 20 15 -17
12 20 16 -18
12 20 17 -3
12 25 14 -18
12 25 15 -11
12 25 16 -1
12 25 17 6
12 25 18 14
12 21 12 -3
12 21 13 19
12 21 14 16
12 21 15 7
这是我想要的样子,从而去除对于给定横断的所有年份不发生的cros(x)值。
And this is how I want it to look like, so that the cros (x) values that do not occur for all the years for a give transect are removed.
along year cros value
10 20 11 -3
10 20 12 5
10 25 11 -9
10 25 12 -3
10 21 11 -8
10 21 12 -8
11 20 11 7
11 20 12 -4
11 25 11 -6
11 25 12 9
11 21 11 6
11 21 12 17
12 20 14 -16
12 20 15 -17
12 25 14 -18
12 25 15 -11
12 21 14 16
12 21 15 7
推荐答案
这里有一种方法。找到您要保留的所有条目,cros
条目,然后将它们合并回来:
Here's one way of doing it. Find all the along,cros
entries that you want to keep and then merge them back:
dt = data.table(df)
# find the intersections; run in pieces to see what's going on here
to.keep = dt[, list(list(unique(cros))), by = list(along, year)][,
list(cros = Reduce(intersect, V1)), by = along]
# set the keys to merge together
setkey(to.keep, along, cros)
setkey(dt, along, cros)
# final result
res = to.keep[dt, nomatch = 0]
# optionally, you can order and rearrange columns
setkey(res, along, year, cros)[, names(dt), with = F]
# along year cros value
# 1: 10 20 11 11
# 2: 10 20 12 7
# 3: 10 21 11 -4
# 4: 10 21 12 9
# 5: 10 25 11 -16
# 6: 10 25 12 8
# 7: 11 20 11 17
# 8: 11 20 12 1
# 9: 11 21 11 8
#10: 11 21 12 -13
#11: 11 25 11 -7
#12: 11 25 12 17
#13: 12 20 14 12
#14: 12 20 15 -7
#15: 12 21 14 3
#16: 12 21 15 9
#17: 12 25 14 6
#18: 12 25 15 -2
这篇关于根据两个因子级别删除行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!