如何使用purrr:map函数使用动态变量更改多列? [英] How to mutate multiple columns with dynamic variable using purrr:map function?

查看:81
本文介绍了如何使用purrr:map函数使用动态变量更改多列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有如下数据框:

df <- data.frame(
  id  = c(1:5),
  a   = c(3,10,4,0,15),
  b   = c(2,1,1,0,3),
  c   = c(12,3,0,3,1),
  d   = c(9,7,8,0,0),
  e   = c(1,2,0,2,2)
  )

我需要添加多列,其名称由组合给出a:c 3:5 3:5 也用于 sum 函数:

I need to add multiple columns of which names are given by a combination of a:c and 3:5. 3:5 is also used insum function:

df %>% mutate(
  usa_3 = sum(1+3),
  usa_4 = sum(1+4),
  usa_5 = sum(1+5),
  canada_3 = sum(1+3),
  canada_4 = sum(1+4),
  canada_5 = sum(1+5),
  nz_3 = sum(1+3),
  nz_4 = sum(1+4),
  nz_5 = sum(1+5)
  )

结果确实很简单,但我不想重复输入类似的代码。

The result is really simple but I do not want to put similar codes repeatedly.

  id  a b  c d e usa_3 usa_4 usa_5 canada_3 canada_4 canada_5 nz_3 nz_4 nz_5
1  1  3 2 12 9 1     4     5     6        4        5        6    4    5    6
2  2 10 1  3 7 2     4     5     6        4        5        6    4    5    6
3  3  4 1  0 8 0     4     5     6        4        5        6    4    5    6
4  4  0 0  3 0 2     4     5     6        4        5        6    4    5    6
5  5 15 3  1 0 2     4     5     6        4        5        6    4    5    6

变量是字母前缀,整数范围作为后缀。
后缀也与 sum 函数有关,为 1 + postfix
在这种情况下,它们每个都有3个值,因此结果有9个附加列。

The variables are alphabetical prefix and range of integers as postfix. Postfix is also related to the sum funcion as 1+postfix. In this case, they have 3 values for each so the result have 9 additional columns.

我不喜欢在一堆代码之外定义函数并假设 purrr 中的 map functino可能会有所帮助。

I do not prefer to define function outside the a bunch of codes and suppose map functino in purrr may help it.

您知道如何使其工作吗?
特别是很难在管道中提供动态列名。

Do you know how to make it work? Especially it is difficult to give dynamic column name in pipe.

我发现了一些类似的问题,但它与我的需求不符。

I found some similar questions but it does not match my need.

多元变异

如何使用地图从purrr和dplyr :: mutate基于列对创建多个新列

=====其他信息=====

让我澄清一下此问题的一些条件。
实际上 sum(1 + 3) sum(1 + 4) ...部分已替换通过 as.factor(cutree(X,k = X)),其中 X 重用了聚类分析,而 Y 是在示例中定义为 3:5 的变量。 cutree()是一个函数,用于定义我们切割存储在聚类分析结果中的树状图的哪个部分。

===== ADDITIONAL INFO =====
Let me clarify some conditions of this issue. Actually sum(1+3), sum(1+4)... part is replaced by as.factor(cutree(X,k=X)) where X is reuslt of cluster analysis and Y is a variable defined as 3:5 in the example. cutree() is a function to define in which part we cut a dendrogram stored in the result of cluster analysis.

至于列名 usa_3,usa_4 ... nz_5 ,国家名称已替换为聚类分析方法,例如病房,McQuitty,中位数方法等(七个方法) ),以及整数3、4、5是定义我需要在哪一部分切割树状图的参数,如所述。

As for the column names usa_3, usa_4 ... nz_5, country name is replaced by methods of cluster analysis such as ward, McQuitty, Median method, etc. (seven methods), and integers 3, 4, 5, are the parameter to define in which part I need to cut a dendrogram as explained.

对于 as.factor(cutree(X,k = X))中的> X ,聚类分析的结果也有几个数据框,分别是对应于每种方法。我意识到另一个问题是如何将功能应用于每个数据框(集群分析的结果存储在不同的数据框中)。

我当前正在使用的实际脚本是这样的:

As for an X in the functionas.factor(cutree(X,k=X)), results of cluster analysis also have several data frame which is corresponded to each method. I realized that another issue how to apply the function to each data frame (result of cluster analysis stored in different dataframe).
Actual scripts that I am using currently is something like this:

cluste_number <- original_df %>% mutate(
    ## Ward
    ward_3=as.factor(cutree(clst.ward,k=3)),
    ward_4=as.factor(cutree(clst.ward,k=4)),
    ward_5=as.factor(cutree(clst.ward,k=5)),
    ward_6=as.factor(cutree(clst.ward,k=6)),
    ## Single
    sing_3=as.factor(cutree(clst.sing,k=3)),
    sing_4=as.factor(cutree(clst.sing,k=4)),
    sing_5=as.factor(cutree(clst.sing,k=5)),
    sing_6=as.factor(cutree(clst.sing,k=6)))

很抱歉不澄清实际问题;但是,由于上述原因,国家/地区的数量为美国,加拿大,新西兰,参数数量为 1:3 不匹配。
还有一些使用 i +的建议。作为函数 as.factor(cutree(X,k = X) )用于实际操作。

It is sorry not to clarify the actual issue; howerver, due to this reason above, number of countries as usa, canada, nz and number of parameters as 1:3 do not match. Also some suggestions using i + . does not meet the issue as a function as.factor(cutree(X,k=X)) is used in the actual operation.

感谢您的支持。

推荐答案

我不确定是否理解问题的实质,但这是一种使用所需的列名和值生成数据框的方法。

I'm not sure if I understand the spirit of the problem, but here is one way to generate a data frame with the column names and values you want.

您可以将〜function(i)i +更改为 i (要更改的列),然后更改 setNames(n,n) n s个>将不同的值合并到正在创建的函数中(第一个 n )或更改结果列的名称(第二个 n )。

You can change ~ function(i) i + . to be whatever function of i (the column being mutated) you want, and change either of the ns in setNames(n, n) to incorporate a different value into the function you're creating (first n) or change the names of the resulting columns (second n).

countries <- c('usa', 'canada', 'nz')
n <- 3:5

as.data.frame(matrix(1, nrow(df), length(n))) %>% 
  rename_all(~countries) %>%
  mutate_all(map(setNames(n, n), ~ function(i) i + .)) %>% 
  select(-countries) %>% 
  bind_cols(df)

#   usa_3 canada_3 nz_3 usa_4 canada_4 nz_4 usa_5 canada_5 nz_5 id  a b  c d e
# 1     4        4    4     5        5    5     6        6    6  1  3 2 12 9 1
# 2     4        4    4     5        5    5     6        6    6  2 10 1  3 7 2
# 3     4        4    4     5        5    5     6        6    6  3  4 1  0 8 0
# 4     4        4    4     5        5    5     6        6    6  4  0 0  3 0 2
# 5     4        4    4     5        5    5     6        6    6  5 15 3  1 0 2

这篇关于如何使用purrr:map函数使用动态变量更改多列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆