在data.table中的2个不同列中折叠行? [英] collapse rows in 2 different columns in data.table?

查看：44 发布时间：2021/4/28 19:40:54 r data.table

本文介绍了在data.table中的2个不同列中折叠行?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对于虚拟数据集

require(data.table)
require(reshape2)
teamid <- c(1,2,3)
member <- c("a,b","","c,g,h")
leader <- c("c", "d,e", "")
dt <- data.table(teamid, member, leader)

现在数据集看起来像这样:

Now the dataset looks like this:


   teamid member leader
1:      1    a,b      c
2:      2           d,e
3:      3  c,g,h

3列.对于每个团队，他们在不同的列中都有团队成员和团队负责人.团队可能只有成员而没有领导者，反之亦然.

3 Columns. For each team, they have team members, and team leaders in different column. Teams may have only members without leaders, and vice versa.

以下是我的 ALMOST 所需的输出:

The following is my ALMOST desired output:


   teamid value leader
1:      1     a  FALSE
2:      1     b  FALSE
3:      1     c   TRUE
4:      1     c   TRUE
5:      2     d   TRUE
6:      2     e   TRUE
7:      3     c  FALSE
8:      3     g  FALSE
9:      3     h  FALSE

我希望将两列合并为一个，如果一个是团队负责人，则添加一个标签.

I want to have the two columns merged into one, and add a tag if one is a team leader.

对此我有一个丑陋的解决方案，

I have an ugly solution for this,

<代码>dt1<-dt [，strsplit(member，，")，by = teamid]dt2<-dt [，strsplit(leader，，")，by = teamid]setkey(dt1，teamid)setkey(dt2，teamid)dt3<-merge(dt1，dt2，all = TRUE)dt4<-熔体(dt3，id = 1，尺寸= c("V1.x"，"V1.y"))dt5<-dt4 [value！="NA_real"]dt6<-dt5 [，领导:=(变量=="V1.y")] [，变量:= NULL]setkey(dt6，teamid)setnames(dt6，value，member)

问题:

我认为这种解决方案效率不高，请先合并再融化.那么关于实现此目的的其他方式的任何想法吗?

This solution is not efficency I think, first merge and then melt. So any ideas about other ways to do this?

第3行和第4行有重复的行.

There're duplicated rows, in row 3 and row 4.

当我尝试更改列名时出现错误

When I tried to change column name, an error came up

setnames(dt6，value，member)

setnames(dt6,value,member)

setnames(dt6，value，member)中的错误:找不到对象'value'

Error in setnames(dt6, value, member) : object 'value' not found

也许最重要的是，

当我尝试对具有100万行，3列的真实数据集进行测试时，发生了以下错误

When I tried to test on my real dataset, which have more 1million rows, 3 columns the following error occured

merge(df1，df2，all = TRUE)vecseq(f__，len__，if(allow.cartesian)NULL否则为as.integer(max(nrow(x)，:连接结果为238797行；大于142095 = max(nrow(x)，nrow(i)).检查i中是否有重复的键值，每个键值都一遍又一遍地连接到x中的同一组.如果可以，请尝试包含 j 并删除 by (逐个而不逐个)，以便j为每个组运行以避免大分配.如果确定要继续，请使用allow.cartesian = TRUE重新运行.否则，请在FAQ，Wiki，堆栈溢出和数据表帮助中搜索此错误消息，以获取建议.

merge(df1,df2, all = TRUE) Error in vecseq(f__, len__, if (allow.cartesian) NULL else as.integer(max(nrow(x), : Join results in 238797 rows; more than 142095 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. If that's ok, try including j and dropping by (by-without-by) so that j runs for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.

有什么建议吗?非常感谢！

Any suggestion? Thanks a lot!

推荐答案

首先融合.

result <- melt(dt,id="teamid", variable.name="status", value.name="member")
result <- result[nchar(member)>0,strsplit(member,","),by=list(teamid,status)]
setnames(result,"V1","member")
setkey(result,teamid,status)
result
#    teamid status member
# 1:      1 member      a
# 2:      1 member      b
# 3:      1 leader      c
# 4:      2 leader      d
# 5:      2 leader      e
# 6:      3 member      c
# 7:      3 member      g
# 8:      3 member      h

如果您想摆脱 status 列并在成员列中添加标记"，则可以采用以下方式:

If you want to get rid of the status column and add a "tag" to the member column, you can do it this way:

result[status=="leader",member:=paste0(member,"*")]
result[,status:=NULL]
result
#    teamid member
# 1:      1      a
# 2:      1      b
# 3:      1     c*
# 4:      2     d*
# 5:      2     e*
# 6:      3      c
# 7:      3      g
# 8:      3      h

这篇关于在data.table中的2个不同列中折叠行?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在data.table中的2个不同列中折叠行? [英] collapse rows in 2 different columns in data.table?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在data.table中的2个不同列中折叠行? [英] collapse rows in 2 different columns in data.table?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭