然后按行排序跨数据帧的特定列进行连接 [英] Row-wise sort then concatenate across specific columns of data frame

查看:97
本文介绍了然后按行排序跨数据帧的特定列进行连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(相关的问题不包括在内排序.不需要排序时只需使用paste即可.)

(Related question that does not include sorting. It's easy to just use paste when you don't need to sort.)

我有一个结构不太理想的表,其中的字符列是通用的"item1","item2"等.我想创建一个新的字符变量,这些变量是按字母顺序,逗号分隔的这些列的连接.例如,在第5行中,如果item1 =牛奶",item2 =鸡蛋"和item3 =黄油",则第5行中的新变量可能是黄油,鸡蛋,牛奶"

I have a less-than-ideally-structured table with character columns that are generic "item1","item2" etc. I would like to create a new character variable that is the alphabetized, comma-separated concatenation of these columns. So for example, in row 5, if item1 = "milk", item2 = "eggs", and item3 = "butter", the new variable in row 5 might be "butter, eggs, milk"

我在下面编写了一个对两个字符变量起作用的函数f().但是,我遇到了麻烦

I wrote a function f() below that works on two character variables. However, I am having trouble

  • 使用mapply或其他向量化"(我知道这实际上只是一个for循环)
  • 将函数泛化为任意数量的列
  • Using mapply or other "vectorization" (I know it's really just a for loop)
  • Generalizing the function to an arbitrary number of columns

任何帮助,不胜感激.

Any help much appreciated.

df <- data.frame(a =c("foo","bar"), 
                 b= c("baz","qux"))   
paste(df$a,df$b, sep=", ")
# returns [1] "foo, baz" "bar, qux" ... but I want [1] "baz, foo" "bar, qux"

f <- function(a,b) paste(c(a,b)[order(c(a,b))],collapse=", ")
f("foo","baz") 
# returns [1] "baz, foo" ... which is what I want ... how to vectorize?

df$new_var <- mapply(f, df$a, df$b)
df 
#     a   b new_var      <- new_var is not what I want
# 1 foo baz    1, 2
# 2 bar qux    1, 2

# Interestingly, data.table is smart enough to fix my bad mapply
library(data.table)
dt <- data.table(a =c("foo","bar"), 
                 b= c("baz","qux"))  
dt[,new_var:=mapply(f, a, b)]
dt
#     a    b  new_var    <- new var IS what I want
# 1: foo baz baz, foo
# 2: bar qux bar, qux

推荐答案

我首先想到的是这样做:

My first thought would've been to do this:

dt[, new_var := paste(sort(.SD), collapse = ", "), by = 1:nrow(dt)]

但是您可以对函数进行一些简单的修改:

But you could make your function work with a couple of simple modifications:

f = function(...) paste(c(...)[order(c(...))],collapse=", ")

dt[, new_var := do.call(function(...) mapply(f, ...), .SD)]

这篇关于然后按行排序跨数据帧的特定列进行连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆