基于另一列中的值的算术运算 [英] Arithmetic operation based on value from another column

查看:78
本文介绍了基于另一列中的值的算术运算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中包含一个值列,且使用了多年。这些年可能不遵循顺序,并且可能缺少第5年。这是一个数据框示例

I have a dataframe with a value column for multiple year. The years might not follow a sequence and might have a missing 5th year. Here is an example dataframe

df = data.frame(code = c("AFG", "AGO", "ALB", "AND", "ARB", "ARE", "ARG", "ARM", "ASM", "ATG", "AUS", "AUT", "AUT", "AUT", "AUT", "ABW", "AFG", "AGO", "ALB", "AND", "ARB", "ARE", "ARG", "ARM", "ARM"),
            PPT = c(123, 42, 23, 5, 42, 4, 23, 25, 42, 23, NA, 5563, 56, 54, 645, 6, 4,53, 656, 65, 5563, 646, 6, 66, 54), 
            Year = c(1990, 1991, 1992, 1993, 1991, 1995, 1996, 1997, 1991, 1992, 2000, 2001, 2002, 2014, 2004, 2005, 2006, 2007, 1960, 2009, NA, 2011, 2012, 2013, 2014))

我想添加一列基于那一年的值与那年+5之间的差。例如如果年列中的第一年为1960,但没有1965年的PPT数据,则new_col中的值为NA。同样,1990年的new_col值将是119(123-4),2000年的NA(2005年没有可用的PPT数据),1991年的19和1992年的-2等。

I want to add an additional column that will be based on the difference between the value for that year and the year+5. Ex. If the first year in the year column is 1960 but no PPT data is available for 1965, therefore the value in the new_col would be NA. Similarly, the value for new_col for the year 1990 would be 119(123-4), NA for the year 2000(no PPT data available for 2005 ), 19 for 1991 and -2 for the year 1992 and so on.

我在excel中有一个非常复杂的方法,但是,我正在寻找R中更简单的解决方案

I have a very convoluted way of doing this in excel, however, I am looking for an easier solution in R

推荐答案

我们可以在'Year'之前安排,并用 lead减去'PPT'

We can arrange by 'Year', and take the difference of 'PPT' with lead of 'PPT' where the 'n' is specified as 5

library(dplyr)
df %>%
    arrange(Year) %>% 
    mutate(newcol = PPT - lead(PPT, n = 5, default = 0))
#    code  PPT Year newcol
#1   AFG  123 1990    119
#2   AGO   42 1991     19
#3   ALB   23 1992     -2
#4   AND    5 1993     -1
#5   ARB   23 1994   -611
#6   ARE    4 1995     -1
#7   ARG   23 1996  -5540
#8   ARM   25 1997    -31
#9   ASM    6 1998    -50
#10  ATG  634 1999    -11
#...

如果有些年份丢失,我们可以使用 complete 扩展数据,然后进行 mutate

if some 'Year's are missing, we can expand the data with complete and then do the mutate

library(tidyr)
df %>% 
    arrange(Year) %>% 
    complete(Year = min(Year):max(Year)) %>%
    mutate(newcol = PPT - lead(PPT, n = 5, default = 0)) %>%
    filter(!is.na(PPT))






或使用 base R

df$newcol <- with(df, c(head(PPT, -5) - tail(PPT, -5), tail(PPT, 5)))



数据



data

df <- structure(list(code = structure(c(2L, 3L, 4L, 5L, 6L, 7L, 8L, 
9L, 10L, 11L, 12L, 13L, 13L, 13L, 13L, 1L, 2L, 3L, 4L, 5L, 6L, 
7L, 8L, 9L, 9L), .Label = c("ABW", "AFG", "AGO", "ALB", "AND", 
"ARB", "ARE", "ARG", "ARM", "ASM", "ATG", "AUS", "AUT"), class = "factor"), 
    PPT = c(123, 42, 23, 5, 23, 4, 23, 25, 6, 634, 5, 5563, 56, 
    56, 645, 6, 4, 656, 645, 65, 5563, 646, 6, 66, 54),
    Year = 1990:2014), class = "data.frame", row.names = c(NA, 
-25L))

这篇关于基于另一列中的值的算术运算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆