如何选择符合特定条件的R数据帧中的第一行? [英] How do I select the first row in an R data frame that meets certain criteria?

查看:114
本文介绍了如何选择符合特定条件的R数据帧中的第一行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何选择符合特定标准的R数据框的第一行?



这是上下文:



我有一个包含五列数据框:

 pixel,year,propvar 组件,cumsum。 

像素,因为数据是根据25个学习年份的每个49个地理像素的年度时间序列计算的。在每个像素年内,我已经计算了一个给定像素年的时间序列的快速傅立叶变换的给定分量解释的总方差的总和的总和 propvar 。然后,我计算了一个像素年内每个频率分量的 cumsum ,它是 propvar 的累积和。 组件列仅为您提供了一个用于傅立叶级数组件(加1)的索引,从中计算出 propvar / p>

我想确定解释超过99%差异所需的组件数量。我认为这样做的一个方法是在每个像素年内找到第一行,其中 cumsum > 0.99,并从中创建一个数据框,其中有三列像素 numbercomps ,其中 numbercomps 是在给定像素年内所需的组件数量,以解释超过99%的差异。我不知道如何在R.有没有人有解决方案?

解决方案

当然可以。这样做应该是诀窍:

 #创建一个可重复的例子! 
df< - data.frame(year = c(2001,2003,2001,2003,2003),
pixel = c(a b,a,b,a),
cumsum = c(99,99,98,99,99),
numbercomps = 1:5)
df
#年像素cumsum numbercomps
#1 2001 a 99 1
#2 2003 b 99 2
#3 2001 a 98 3
#4 2003 b 99 4
#5 2003 a 99 5

#提取您喜欢的子页面。
res< - subset(df,cumsum> = 99)
res < - subset(res,
subset =!duplicateated(res [c(year,pixel ]),
select = c(pixel,year,numbercomps))
#像素年份数字
#1 a 2001 1
#2 b 2003 2
#5 a 2003 5

编辑另外,对于那些对 data.table 感兴趣,有这样的:

 库(数据.table)
dt< - data.table(df,key =pixel,year)
dt [cumsum> = 99,.SD [1],by = key(dt)]


How do I select the first row of an R data frame that meets certain criteria?

Here is the context:

I have a data frame with five columns:

"pixel", "year","propvar", "component", "cumsum." 

There are 1,225 combinations of pixel and year, because the data was computed from the annual time series of 49 geographic pixels for each of 25 study years. Within each pixel-year, I have computed propvar, the proportion of total variance explained by a given component of the fast Fourier transform for the time series of a given pixel-year. I then computed cumsum, which is the cumulative sum of propvar for each frequency component within a pixel-year. The component column just gives you an index for the Fourier series component (plus 1) from which propvar was calculated.

I want to determine the number of components required to explain greater than 99% of the variance. I figure one way to do this is to find the first row within each pixel-year where cumsum > 0.99, and create a data frame from it with three columns, pixel, year, and numbercomps, where numbercomps is the number of components required within a given pixel-year to explain greater than 99% of the variance. I do not know how to do this in R. Does anyone have a solution?

解决方案

Sure. Something like this should do the trick:

# CREATE A REPRODUCIBLE EXAMPLE!
df <- data.frame(year = c("2001", "2003", "2001", "2003", "2003"),
                 pixel = c("a", "b", "a", "b", "a"), 
                 cumsum = c(99, 99, 98, 99, 99),
                 numbercomps=1:5)
df
#   year pixel cumsum numbercomps
# 1 2001     a     99           1
# 2 2003     b     99           2 
# 3 2001     a     98           3
# 4 2003     b     99           4
# 5 2003     a     99           5

# EXTRACT THE SUBSET YOU'D LIKE.
res <- subset(df, cumsum>=99)
res <- subset(res, 
              subset = !duplicated(res[c("year", "pixel")]),
              select = c("pixel", "year", "numbercomps"))
#   pixel year numbercomps
# 1     a 2001           1
# 2     b 2003           2
# 5     a 2003           5

EDIT Also, for those interested in data.table, there is this:

library(data.table)
dt <- data.table(df, key="pixel, year")    
dt[cumsum>=99, .SD[1], by=key(dt)]

这篇关于如何选择符合特定条件的R数据帧中的第一行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆