如何选择符合特定条件的R数据帧中的第一行? [英] How do I select the first row in an R data frame that meets certain criteria?
问题描述
如何选择符合特定标准的R数据框的第一行?
这是上下文:
我有一个包含五列数据框:
pixel,year,propvar 组件,cumsum。
像素
和年
,因为数据是根据25个学习年份的每个49个地理像素的年度时间序列计算的。在每个像素年内,我已经计算了一个给定像素年的时间序列的快速傅立叶变换的给定分量解释的总方差的总和的总和 propvar
。然后,我计算了一个像素年内每个频率分量的 cumsum
,它是 propvar
的累积和。 组件
列仅为您提供了一个用于傅立叶级数组件(加1)的索引,从中计算出 propvar
/ p>
我想确定解释超过99%差异所需的组件数量。我认为这样做的一个方法是在每个像素年内找到第一行,其中 cumsum
> 0.99,并从中创建一个数据框,其中有三列像素
,年
和 numbercomps
,其中 numbercomps
是在给定像素年内所需的组件数量,以解释超过99%的差异。我不知道如何在R.有没有人有解决方案?
当然可以。这样做应该是诀窍:
#创建一个可重复的例子!
df< - data.frame(year = c(2001,2003,2001,2003,2003),
pixel = c(a b,a,b,a),
cumsum = c(99,99,98,99,99),
numbercomps = 1:5)
df
#年像素cumsum numbercomps
#1 2001 a 99 1
#2 2003 b 99 2
#3 2001 a 98 3
#4 2003 b 99 4
#5 2003 a 99 5
#提取您喜欢的子页面。
res< - subset(df,cumsum> = 99)
res < - subset(res,
subset =!duplicateated(res [c(year,pixel ]),
select = c(pixel,year,numbercomps))
#像素年份数字
#1 a 2001 1
#2 b 2003 2
#5 a 2003 5
编辑另外,对于那些对 data.table
感兴趣,有这样的:
库(数据.table)
dt< - data.table(df,key =pixel,year)
dt [cumsum> = 99,.SD [1],by = key(dt)]
How do I select the first row of an R data frame that meets certain criteria?
Here is the context:
I have a data frame with five columns:
"pixel", "year","propvar", "component", "cumsum."
There are 1,225 combinations of pixel
and year
, because the data was computed from the annual time series of 49 geographic pixels for each of 25 study years. Within each pixel-year, I have computed propvar
, the proportion of total variance explained by a given component of the fast Fourier transform for the time series of a given pixel-year. I then computed cumsum
, which is the cumulative sum of propvar
for each frequency component within a pixel-year. The component
column just gives you an index for the Fourier series component (plus 1) from which propvar
was calculated.
I want to determine the number of components required to explain greater than 99% of the variance. I figure one way to do this is to find the first row within each pixel-year where cumsum
> 0.99, and create a data frame from it with three columns, pixel
, year
, and numbercomps
, where numbercomps
is the number of components required within a given pixel-year to explain greater than 99% of the variance. I do not know how to do this in R. Does anyone have a solution?
Sure. Something like this should do the trick:
# CREATE A REPRODUCIBLE EXAMPLE!
df <- data.frame(year = c("2001", "2003", "2001", "2003", "2003"),
pixel = c("a", "b", "a", "b", "a"),
cumsum = c(99, 99, 98, 99, 99),
numbercomps=1:5)
df
# year pixel cumsum numbercomps
# 1 2001 a 99 1
# 2 2003 b 99 2
# 3 2001 a 98 3
# 4 2003 b 99 4
# 5 2003 a 99 5
# EXTRACT THE SUBSET YOU'D LIKE.
res <- subset(df, cumsum>=99)
res <- subset(res,
subset = !duplicated(res[c("year", "pixel")]),
select = c("pixel", "year", "numbercomps"))
# pixel year numbercomps
# 1 a 2001 1
# 2 b 2003 2
# 5 a 2003 5
EDIT Also, for those interested in data.table
, there is this:
library(data.table)
dt <- data.table(df, key="pixel, year")
dt[cumsum>=99, .SD[1], by=key(dt)]
这篇关于如何选择符合特定条件的R数据帧中的第一行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!