如何在每个值的条件下 mutate_at 多列? [英] How to mutate_at multiple columns on a condition on each value?

查看:45
本文介绍了如何在每个值的条件下 mutate_at 多列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个超过 100 万行的数据框,以及一天中每小时的一列.我想改变这些列中的每个值,但这种修改取决于值的符号.我怎样才能有效地做到这一点?

I have a dataframe of over 1 million rows, and a column for each hour in the day. I want to mutate each value in those columns, but that modifition depends of the sign of the value. How can I efficiently do that ?

我可以对这些小时值进行收集(然后展开),但在大数据帧上收集似乎很慢.我也可以对所有 24 列执行相同的 mutate,但当 mutate_at 看起来能够做到这一点时,这似乎不是一个很好的解决方案.

I could do a gather on those hourly values (then spread), but gather seems to be pretty slow on big dataframes. I could also just do the same mutate on all 24 columns, but it does not seems like a great solution when mutate_at looks to be able to do exactly that.

在不久的将来,我可能不得不再次进行这种变异,我希望能找到比重复、阅读枯燥的代码更好的东西.

I'll probably have to do that kind of mutate again in the near future, and I hope to find something better than a repetitive, boring to read, code.

df = data.table(
    "ID" = c(1,1,1,2,2), #Should not be useful there
    "Date" = c(1,2,3,1,2), #Should not be useful there
    "total_neg" = c(1,1,0,0,2),
    "total_pos" = c(4,5,2,4,5),
    "H1" = c(5,4,0,5,-5),
    "H2" = c(5,-10,5,5,-5),
    "H3" = c(-10,6,5,0,10)
)

我想应用类似的东西

df%>%
  mutate_at(c("H1", "H2", "H3"), FUN(ifelse( Hour < 0, Hour*total_neg/10, Hour*total_pos/10)))

以小时为每列中的值.它显然不起作用,正如所写的那样,."也不起作用.但我正在寻找的东西意味着我们在 mutate_at 中选择的列中的任何值"

With Hour being the value in each column. And it obviously doesn't work, as written, nor does "." but I'm searching for something that would mean "any value in the columns we select in our mutate_at"

如果有帮助,我目前正在使用存储在两列中的每个实际正值和负值的总和对某些值进行非规范化.

If it helps, I'm currently denormalizing some values with the sum of each actual positives values and negatives values stored in two columns.

在我的例子中,这将是预期的结果:

In my example, this would be the expected result :

df = data.table(
    "ID" = c(1,1,1,2,2),
    "Date" = c(1,2,3,1,2),
    "total_neg" = c(1,1,0,0,2),
    "total_pos" = c(4,5,2,4,5),
    "H1" = c(2,2,0,2,-1),
    "H2" = c(2,-1,1,2,-1),
    "H3" = c(-1,3,1,0,5)
)
df

预先感谢您提供的任何帮助,我必须为我的错误道歉,但作为非本地人,我向您保证我会尽力而为!

Thanks in advance for any help you may provide, and I must apologize for my mistakes, but as a non-native, I assure you that I do my best !

推荐答案

FUN 不是 mutate_at 中的参数.在新版本中,较早使用的 fun 已弃用 list(~ 或简单地 ~.此外,将要选择的列包裹在 vars.也可以不加引号或使用 vars(starts_with("H"))vars(matches("^H\\d+$"))code>.另外,用 替换Hour".

The FUN is not an argument in mutate_at. In the new version, the earlier used fun is deprecated with list(~ or simply ~. Also, wrap the columns to select in vars. It can also be unquoted or use vars(starts_with("H")) or vars(matches("^H\\d+$")). Also, replace the 'Hour' with .

library(dplyr)
df %>%
    mutate_at(vars(c("H1", "H2", "H3")), ~ifelse( . < 0, 
           .*total_neg/10, .*total_pos/10))
#. ID Date total_neg total_pos H1 H2 H3
#1  1    1         1         4  2  2 -1
#2  1    2         1         5  2 -1  3
#3  1    3         0         2  0  1  1
#4  2    1         0         4  2  2  0
#5  2    2         2         5 -1 -1  5

这篇关于如何在每个值的条件下 mutate_at 多列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆