相对于每个组中的值进行缩放(通过dplyr) [英] Scale relative to a value in each group (via dplyr)
问题描述
我有一组时间序列,我想在一个特定的时间间隔内相对于它们的值进行缩放。那么这个时候,每个系列都将是1.0,那么这个比例会变化。
I have a set of time series, and I want to scale each of them relative to their value in a specific interval. That way, each series will be at 1.0 at that time and change proportionally.
我不知道用dplyr怎么做。
I can't figure out how to do that with dplyr.
这是一个使用for循环的工作示例:
Here's a working example using a for loop:
library(dplyr)
data = expand.grid(
category = LETTERS[1:3],
year = 2000:2005)
data$value = runif(nrow(data))
# the first time point in the series
baseYear = 2002
# for each category, divide all the values by the category's value in the base year
for(category in as.character(levels(factor(data$category)))) {
data[data$category == category,]$value = data[data$category == category,]$value / data[data$category == category & data$year == baseYear,]$value[[1]]
}
strong>编辑:修改了基准时间点不可索引的问题。有时候,时间栏实际上是一个因素,这不一定是顺序的。
Edit: Modified the question such that the base time point is not indexable. Sometimes the "time" column is actually a factor, which isn't necessarily ordinal.
推荐答案
这个解决方案非常类似于@ thelatemail,但是我觉得它足够大,足以证明自己的答案,因为它根据条件选择索引:
This solution is very similar to @thelatemail, but I think it's sufficiently different enough to merit its own answer because it chooses the index based on a condition:
data %>% group_by(category) %>% mutate(value = value/value[year == baseYear])
# category year value
#... ... ... ...
#7 A 2002 1.00000000
#8 B 2002 1.00000000
#9 C 2002 1.00000000
#10 A 2003 0.86462789
#11 B 2003 1.07217943
#12 C 2003 0.82209897
(数据输出已被截断)要复制这些结果,创建
。)数据
时,set.seed(123)
(Data output has been truncated. To replicate these results, set.seed(123)
when creating data
.)
这篇关于相对于每个组中的值进行缩放(通过dplyr)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!