R通过追加列合并重复的行 [英] R combine duplicate rows by appending columns

查看:103
本文介绍了R通过追加列合并重复的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个庞大的数据集,其中包含文本注释以及它们对不同变量的评级,例如:

I have a large data set with text comments and their ratings on different variables, like so:

df <- data.frame(
  comment = c("commentA","commentB","commentB","commentA","commentA","commentC" 
  sentiment=c(1,2,1,4,1,2), 
  tone=c(1,5,3,2,6,1)
)

每条评论都会出现1至3次,因为有时会要求多个人对同一条评论进行评分.

Every comment is present between one and 3 times, since multiple people are asked to rate the same comment sometimes.

我正在寻找一个数据框,其中"comment"列仅具有唯一值,而其他列被附加,因此任何一个文本注释的"sentiment"和"tone"列数与等级一样多. (这将导致NA的评论没有被经常评级,但这没关系):

I'm looking to create a data frame where the "comment" column only has unique values, and the other columns are appended, so any one text comment has as many "sentiment" and "tone" columns as there are ratings (which will result in NA's for comments that have not been rated as often, but that's okay):

df <- data.frame(
  comment = c("commentA","commentB","commentC",
  sentiment.1=c(1,2,2), 
  sentiment.2=c(4,1,NA), 
  sentiment.3=c(1,NA,NA), 
  tone.1=c(1,5,1),
  tone.2=c(2,3,NA),
  tone.3=c(6,NA,NA)
)

我一直在尝试使用reshape来解决这个问题,以便从长到宽使用

I've been trying to figure this out using reshape to go from long to wide using

reshape(df, 
  idvar = "comment",
  timevar = c("sentiment","tone"), 
  direction = "wide"
)

但是,这导致了情绪和语气之间的所有可能组合,而不是简单地独立复制情绪和语气.

But that results in all possible combinations between sentiment and tone, rather than simply duplicating sentiment and tone independently.

我也尝试像df %>% gather(key, value, -comment)这样使用gather,但这只能使我半途而废...

I also tried using gather like so df %>% gather(key, value, -comment), but that only gets me halfway there...

有人可以指出正确的方向吗?

Could anyone please point me in the right direction?

推荐答案

您需要创建一个变量以用作列中的数字. rowid(comment)可以解决问题.

You need to create a variable to use as the numbers in the columns. rowid(comment) does the trick.

在dcast中,将行标识符放在~的左侧,将列标识符放在右侧.然后,value.var是要包含在此从长到宽转换中的所有列的字符向量.

In dcast you put the row identifiers to the left of ~ and the column identifiers to the right. Then value.var is a character vector of all columns you want to include int this long-to-wide transformation.

library(data.table)
setDT(df)

dcast(df, comment ~ rowid(comment), value.var = c('sentiment', 'tone'))

#     comment sentiment_1 sentiment_2 sentiment_3 tone_1 tone_2 tone_3
# 1: commentA           1           4           1      1      2      6
# 2: commentB           2           1          NA      5      3     NA
# 3: commentC           2          NA          NA      1     NA     NA

这篇关于R通过追加列合并重复的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆