将变量作为列名称传递给dplyr？ [英] Pass variable as column name to dplyr?

查看：79 发布时间：2017/7/13 21:27:11 r dplyr

本文介绍了将变量作为列名称传递给dplyr？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个非常难看的数据集，它是一个关系数据库的平面文件。一个最小可重复的例子是这里：

I have a very ugly dataset that is a flat file of a relational database. A minimal reproducible example is here:

df <- data.frame(col1 = c(letters[1:4],"c"), 
                  col1.p = 1:5, 
                  col2 = c("a","c","l","c","l"), 
                 col2.p = 6:10,
                  col3= letters[3:7],
                 col3.p = 11:20)

我需要能够识别具有c的'col＃'的'.p'值。我之前关于SO的问题第一部分是：在R中，找到每行包含一个字符串的列。我为上下文提供的

I need to be able to identify the '.p' value for the 'col#' that has the "c". My previous question on SO got the first part: In R, find the column that contains a string in for each row. Which I'm providing for context.

tmp <- which(projectdata=='Transmission and Distribution of Electricity', arr.ind=TRUE)
cnt <- ave(tmp[,"row"], tmp[,"row"], FUN=seq_along)
maxnames <- paste0("max",sequence(max(cnt)))
projectdata[maxnames] <- NA
projectdata[maxnames][cbind(tmp[,"row"],cnt)] <- names(projectdata)[tmp[,"col"]]
rm(tmp, cnt, maxnames)

这将导致数据帧如下所示：

This results in a dataframe that looks like this:

df
   col1 col1.p col2 col2.p col3 col3.p max1
1     a      1    a      6    c     11 col3
2     b      2    c      7    d     12 col2
3     c      3    l      8    e     13 col1
4     d      4    c      9    f     14 col2
5     c      5    l     10    g     15 col1
6     a      1    a      6    c     16 col3
7     b      2    c      7    d     17 col2
8     c      3    l      8    e     18 col1
9     d      4    c      9    f     19 col2
10    c      5    l     10    g     20 col1

当我尝试获得匹配的.p在max1的值，我不断收到错误。我认为这样做是：

When I tried to get the ".p" that matched the value in "max1", I kept getting errors. I thought the approach would be:

df %>%
   mutate(my.p = eval(as.name(paste0(max1,'.p'))))
Error: object 'col3.p' not found

显然，这没有办法，所以我认为这可能与在函数中传递列名相似，我需要使用get。那也没办法。

Clearly, this did not work, so I thought maybe this was similar to passing a column name in a function, where I need to use 'get'. That also didn't work.

df %>%
   mutate(my.p = get(as.name(paste0(max1,'.p'))))
Error: invalid first argument
df %>%
   mutate(my.p = get(paste0(max1,'.p')))
Error: object 'col3.p' not found

我发现有些东西摆脱了这个错误，使用 data.table 从一个不同的但相关的问题，在这里： http://codereply.com/answer/7y2ra3/dplyr-error-object-found-using-rle-mutate.html 。但是，它给我每一行col3.p。这是第一行的max1， df $ max1 [1]

I found something that gets rid of this error, using data.table from a different, but related problem, here: http://codereply.com/answer/7y2ra3/dplyr-error-object-found-using-rle-mutate.html. However, it gives me "col3.p" for every row. This is max1 for the first row, df$max1[1]

library('dplyr')
library('data.table') # must have the data.table package
df %>%
  tbl_dt(df) %>% 
  mutate(my.p = get(paste0(max1,'.p')))

Source: local data table [10 x 8]

   col1 col1.p col2 col2.p col3 col3.p max1 my.p
1     a      1    a      6    c     11 col3   11
2     b      2    c      7    d     12 col2   12
3     c      3    l      8    e     13 col1   13
4     d      4    c      9    f     14 col2   14
5     c      5    l     10    g     15 col1   15
6     a      1    a      6    c     16 col3   16
7     b      2    c      7    d     17 col2   17
8     c      3    l      8    e     18 col1   18
9     d      4    c      9    f     19 col2   19
10    c      5    l     10    g     20 col1   20

使用 lazyeval interp 方法（从这个SO：将dplyr中的动态列名称传递给自定义函数？）对我来说不行也许我正在执行不正确？

Using the lazyeval interp approach (from this SO: Hot to pass dynamic column names in dplyr into custom function?) doesn't work for me. Perhaps I am implementing it incorrectly?

library(lazyeval)
library(dplyr)
df %>%
  mutate_(my.p = interp(~colp, colp = as.name(paste0(max1,'.p'))))

我收到错误：

Error in paste0(max1, ".p") : object 'max1' not found

理想情况下，我将新列<$根据 max1 my.p 等于适当的 p >。

Ideally, I will have the new column my.p equal the appropriate p based on the column identified in max1.

我可以用 ifelse 这样做，但是我试图用较少的代码和

I can do this all with ifelse, but I am trying to do it with less code and to make it applicable to the next ugly flat table.

推荐答案

我们可以用 data.table 。我们将data.frame转换为data.table（ setDT（df）），按行序列分组，我们 get 输入粘贴的值，并将它分配给一个新的列（：= ）（'my.p'）。

We can do this with data.table. We convert the 'data.frame' to 'data.table' (setDT(df)), grouped by the the row sequence, we get the value of the paste output, and assign (:=) it to a new column ('my.p').

library(data.table) setDT(df)[, my.p:= get(paste0(max1, '.p')), 1:nrow(df)] df # col1 col1.p col2 col2.p col3 col3.p max1 my.p # 1: a 1 a 6 c 11 col3 11 # 2: b 2 c 7 d 12 col2 7 # 3: c 3 l 8 e 13 col1 3 # 4: d 4 c 9 f 14 col2 9 # 5: c 5 l 10 g 15 col1 5 # 6: a 1 a 6 c 16 col3 16 # 7: b 2 c 7 d 17 col2 7 # 8: c 3 l 8 e 18 col1 3 # 9: d 4 c 9 f 19 col2 9 #10: c 5 l 10 g 20 col1 5

这篇关于将变量作为列名称传递给dplyr？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将变量作为列名称传递给dplyr？ [英] Pass variable as column name to dplyr?

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

将变量作为列名称传递给dplyr？ [英] Pass variable as column name to dplyr?

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭