相对于每个组中的值进行缩放(通过dplyr) [英] Scale relative to a value in each group (via dplyr)

查看:103
本文介绍了相对于每个组中的值进行缩放(通过dplyr)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组时间序列,我想在一个特定的时间间隔内相对于它们的值进行缩放。那么这个时候,每个系列都将是1.0,那么这个比例会变化。

I have a set of time series, and I want to scale each of them relative to their value in a specific interval. That way, each series will be at 1.0 at that time and change proportionally.

我不知道用dplyr怎么做。

I can't figure out how to do that with dplyr.

这是一个使用for循环的工作示例:

Here's a working example using a for loop:

library(dplyr)

data = expand.grid(
  category = LETTERS[1:3],
  year = 2000:2005)
data$value = runif(nrow(data))

# the first time point in the series
baseYear = 2002

# for each category, divide all the values by the category's value in the base year
for(category in as.character(levels(factor(data$category)))) {
  data[data$category == category,]$value = data[data$category == category,]$value / data[data$category == category & data$year == baseYear,]$value[[1]]
}

strong>编辑:修改了基准时间点不可索引的问题。有时候,时间栏实际上是一个因素,这不一定是顺序的。

Edit: Modified the question such that the base time point is not indexable. Sometimes the "time" column is actually a factor, which isn't necessarily ordinal.

推荐答案

这个解决方案非常类似于@ thelatemail,但是我觉得它足够大,足以证明自己的答案,因为它根据条件选择索引:

This solution is very similar to @thelatemail, but I think it's sufficiently different enough to merit its own answer because it chooses the index based on a condition:

data %>% group_by(category) %>% mutate(value = value/value[year == baseYear])
#   category  year      value
#...     ...   ...       ...
#7         A  2002 1.00000000
#8         B  2002 1.00000000
#9         C  2002 1.00000000
#10        A  2003 0.86462789
#11        B  2003 1.07217943
#12        C  2003 0.82209897

(数据输出已被截断)要复制这些结果,创建数据时,set.seed(123)。)

(Data output has been truncated. To replicate these results, set.seed(123) when creating data.)

这篇关于相对于每个组中的值进行缩放(通过dplyr)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆