根据两个因子级别删除行 [英] Remove row based on two factor levels

查看:250
本文介绍了根据两个因子级别删除行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一个与此问题非常相似的问题,但是我的数据按两个级别分组。

I had a problem that is very similar to this question, however my data is grouped by two levels.

str(dt)
'data.frame':   202206 obs. of  4 variables:
$ cros : int  -205 -200 -195 -190 -185 -180 -175 -170 -165 -160 ...
$ along: Factor w/ 113 levels "100","101","102",..: 1 1 1 1 1 1 1 1 1 1 ...
$ alti : num  1.61 1.6 1.6 1.6 1.6 1.59 1.59 1.59 1.59 1.58 ...
$ year : Factor w/ 6 levels "1979","1983",..: 1 1 1 1 1 1 1 1 1 1 ...

head(dt)
cros along alti year
-205   100 1.61 1979
-200   100 1.60 1979
-195   100 1.60 1979
-190   100 1.60 1979
-185   100 1.60 1979
-180   100 1.59 1979

这些数据是来自不同横断面的信息,它们每隔5米测量一次,这是可变交叉高度,它是变量alti。他们已经做了多年,但是有时候,在一个特定的年份,断面更长。所以我想删除的交叉点,不是测量所有年份的行。

This data is information from different transects which is the variable along, over that transect they measured at every 5 meter which is the variable cros the altitude which is the variable alti. This they have done over multiple years, however sometimes the transect was longer at a particular year. So I want to remove the rows with a cros points that were not measured all years.

对于我的数据集,我有一个因子(沿)113级, 有6个级别。在这些值中,我有一年要做的分析的x(沿)和y( alti )然而对于多年来,x必须是相同的值。我想要因子 cros 删除中不会出现的每个因子沿

For my data set I have one factor (along) with 113 levels and within that factor I have the factor year with 6 levels. Within these to values I have x (along) and y (alti) which I want to do analysis over the year however for the years the x has to be the same values. I want for the factor cros to remove the values that do not occur at all the years for each factor of along.

我使用的代码是:

require(data.table)
dt <- as.data.table(total)
tt <- dt[,length(unique(along,year)),by=cros]
tt <- tt[V1==max(V1)]
test <-dt[cros %in% tt$cros]

但我没有得到正确的结果。我可以图像独特(沿,年)不是正确的方式来处理分组数据。但我不知道如何做到正确。

But I do not get the right result. I can image that unique(along,year) is not the right way to work with grouped data. However I do not know how to do it right.

这里更清楚一点。

> df <- data.frame(along =       c(10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,12,12,12,12,12,12,12,12,12,12,12,12,12), year = c(20,20,20,25,25,25,21,21,20,20,25,25,25,21,21,21,20,20,20,20,25,25,25,25,25,21,21,21,21), cros = c(11,12,13,11,12,13,11,12,11,12,11,12,13,11,12,13,14,15,16,17,14,15,16,17,18,12,13,14,15), value = ceiling(rnorm(29)*10))
> df
   along year cros value
   10    20   11    -3
   10    20   12     5
   10    20   13   -22
   10    25   11    -9
   10    25   12    -3
   10    25   13    -8
   10    21   11    -8
   10    21   12    -8
   11    20   11     7
   11    20   12    -4
   11    25   11    -6
   11    25   12     9
   11    25   13    -5
   11    21   11     6
   11    21   12    17
   11    21   13    -5
   12    20   14   -16
   12    20   15   -17
   12    20   16   -18
   12    20   17    -3
   12    25   14   -18
   12    25   15   -11
   12    25   16    -1
   12    25   17     6
   12    25   18    14
   12    21   12    -3
   12    21   13    19
   12    21   14    16
   12    21   15     7

这是我想要的样子,从而去除对于给定横断的所有年份不发生的cros(x)值。

And this is how I want it to look like, so that the cros (x) values that do not occur for all the years for a give transect are removed.

   along year cros value
    10    20   11    -3
    10    20   12     5
    10    25   11    -9
    10    25   12    -3
    10    21   11    -8
    10    21   12    -8
    11    20   11     7
    11    20   12    -4
    11    25   11    -6
    11    25   12     9
    11    21   11     6
    11    21   12    17
    12    20   14   -16
    12    20   15   -17
    12    25   14   -18
    12    25   15   -11
    12    21   14    16
    12    21   15     7


推荐答案

这里有一种方法。找到您要保留的所有条目,cros 条目,然后将它们合并回来:

Here's one way of doing it. Find all the along,cros entries that you want to keep and then merge them back:

dt = data.table(df)

# find the intersections; run in pieces to see what's going on here
to.keep = dt[, list(list(unique(cros))), by = list(along, year)][,
               list(cros = Reduce(intersect, V1)), by = along]

# set the keys to merge together
setkey(to.keep, along, cros)
setkey(dt, along, cros)

# final result
res = to.keep[dt, nomatch = 0]

# optionally, you can order and rearrange columns
setkey(res, along, year, cros)[, names(dt), with = F]
#    along year cros value
# 1:    10   20   11    11
# 2:    10   20   12     7
# 3:    10   21   11    -4
# 4:    10   21   12     9
# 5:    10   25   11   -16
# 6:    10   25   12     8
# 7:    11   20   11    17
# 8:    11   20   12     1
# 9:    11   21   11     8
#10:    11   21   12   -13
#11:    11   25   11    -7
#12:    11   25   12    17
#13:    12   20   14    12
#14:    12   20   15    -7
#15:    12   21   14     3
#16:    12   21   15     9
#17:    12   25   14     6
#18:    12   25   15    -2

这篇关于根据两个因子级别删除行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆