从另一个数据帧更新数据帧 [英] Update dataframe from another dataframe

查看:154
本文介绍了从另一个数据帧更新数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个表,超过500M行的交易和超过3M行的客户

  data < data.frame(Trans = c(1,2,3,4,5),Cust01 = c(A,B,C,D,F),
Cust02 = c(S,E,,TE,F),Cust03 = c(F,,D,,F))

cust_type< -data.frame(Cust = c(A,B,C,D),Type = c(1,2,3 ))

dataresult< - data.frame(Trans = c(1,2,3,4,5),
Cust01 = c(A,B C,D,F),
Cust01Type = c(1,2,3,4,5),
Cust02 = c S,E,,TE,F),
Cust02Type = c(,,,,),
Cust03 = c(F,,D,,F),
Cust03Type = c(,,4,,))

我想以有效的方式将客户类型添加到数据。通常使用 sql 我将使用多个左连接,我尝试使用 dplyr 但永远。我还试图使用%中的%与逻辑返回,然后一个循环只是专注于真正的值。
有人知道一个更好的方法吗?

解决方案

当你想要快速的表现时,没有什么比 data.table package(yet)。由于您的交易数据现在处于宽格式,所以首先要将其转换为长格式。这将使它更容易处理。

  library(data.table)#v1.9.5 
trans_data< - 融合(setDT(data),id.vars =Trans,
variable.name =Cust,#set name variable column
variable.factor = TRUE,#设置为因子变量而不是一个字符变量
value.name =Cvalue)[!Cvalue ==]#set name value column&删除空案件

完成后,您可以加入两个数据表:

 #设置您正在加入的密钥
setDT(trans_data,key =Cvalue)
setDT(cust_type, key =Cust)

#将客户类型加入到交易数据
trans_data [cust_type,Ctype:= Type]

这给出:

 > trans_data 
Trans Cust Cvalue Ctype
1:1 Cust01 A 1
2:2 Cust01 B 2
3:3 Cust01 C 3
4:4 Cust01 D 4
5:3 Cust03 D 4
6:2 Cust02 E NA
7:5 Cust01 F NA
8:5 Cust02 F NA
9:1 Cust03 F NA
10:5 Cust03 F NA
11:1 Cust02 S NA
12:4 Cust02 TE NA

如果要更改生成的 data.table 中的顺序,可以使用例如:

  setorder(trans_data,Trans,Cust)

或全部同时使用:

  trans_data<  -  trans_data [cust_type,Ctype:= Type] [order Trans,Cust)] 

其中:

 > trans_data 
Trans Cust Cvalue Ctype
1:1 Cust01 A 1
2:1 Cust02 S NA
3:1 Cust03 F NA
4:2 Cust01 B 2
5:2 Cust02 E NA
6:3 Cust01 C 3
7:3 Cust03 D 4
8:4 Cust01 D 4
9:4 Cust02 TE NA
10:5 Cust01 F NA
11:5 Cust02 F NA
12:5 Cust03 F NA






注意:我使用了开发版本的 data.table ,它不再需要加载 功能 c code code code code $

I have 2 tables, "transactions" with over 500M rows and "Customers" over 3M rows

data <- data.frame(Trans = c(1,2,3,4,5), Cust01 = c("A","B","C","D","F"),
                   Cust02 = c("S","E","","TE","F"), Cust03 = c("F","","D","","F"))

cust_type <-data.frame(Cust = c("A","B","C","D"), Type = c("1","2","3","4"))

dataresult <- data.frame(Trans = c(1,2,3,4,5),
                         Cust01 = c("A","B","C","D","F"), 
                         Cust01Type = c("1","2","3","4","5"),
                         Cust02 = c("S","E","","TE","F"), 
                         Cust02Type = c("","","","",""),
                         Cust03 = c("F","","D","","F"),
                         Cust03Type = c("","","4","",""))

I would like to add the customer type to the data in an efficient way. Normally with sql I will use multiple left join, I tried that with dplyr but takes forever. I also tried to use %in% with logic return and then a loop just to focus on the true values. Does someone know a better way to do this?

解决方案

When you want fast performance, nothing beats the data.table package (yet). As your transaction data are now in wide format, the first step to do is convert it to long format. This will make it easier to process.

library(data.table) #v1.9.5
trans_data <- melt(setDT(data), id.vars = "Trans",
                   variable.name = "Cust",               # set name variable column
                   variable.factor = TRUE,               # set as a factor variable instead of a character variable
                   value.name = "Cvalue")[!Cvalue==""]   # set name value column & remove empty cases

When you have done that, you can join the two datatables:

# set the keys by which you are joining
setDT(trans_data, key = "Cvalue")
setDT(cust_type, key = "Cust")

# join the customer type into the transaction data
trans_data[cust_type, Ctype:=Type]

this gives:

> trans_data
    Trans   Cust Cvalue Ctype
 1:     1 Cust01      A     1
 2:     2 Cust01      B     2
 3:     3 Cust01      C     3
 4:     4 Cust01      D     4
 5:     3 Cust03      D     4
 6:     2 Cust02      E    NA
 7:     5 Cust01      F    NA
 8:     5 Cust02      F    NA
 9:     1 Cust03      F    NA
10:     5 Cust03      F    NA
11:     1 Cust02      S    NA
12:     4 Cust02     TE    NA

If you want to change the order in the resulting data.table, you can do that with for example:

setorder(trans_data, Trans, Cust)

or all at once with:

trans_data <- trans_data[cust_type, Ctype:=Type][order(Trans,Cust)]

which gives:

> trans_data
    Trans   Cust Cvalue Ctype
 1:     1 Cust01      A     1
 2:     1 Cust02      S    NA
 3:     1 Cust03      F    NA
 4:     2 Cust01      B     2
 5:     2 Cust02      E    NA
 6:     3 Cust01      C     3
 7:     3 Cust03      D     4
 8:     4 Cust01      D     4
 9:     4 Cust02     TE    NA
10:     5 Cust01      F    NA
11:     5 Cust02      F    NA
12:     5 Cust03      F    NA


Note: I used the development version of data.table, with which it is not needed anymore to load the reshape2 package for the melt function.

这篇关于从另一个数据帧更新数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆