使用dplyr将一个data.frame的输出管道输出到另一个 [英] Pipe output of one data.frame to another using dplyr

查看：129 发布时间：2017/7/13 21:55:16 r nested dplyr

本文介绍了使用dplyr将一个data.frame的输出管道输出到另一个的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个data.frames - 一个查询表，告诉我一个组中包含的集合产品。每个组至少具有<1>类型1和类型2的一个产品。

第二个data.frame告诉我有关事务的详细信息。每个交易都可以具有以下产品之一：

a）只有来自其中一个组的类型1的 s

b）只有来自其中一个组的类型2的产品 s

c）类型的产品1和类型2 从同一组

对于我的分析，我有兴趣发现c）以上，即有多少交易有类型1 和类型2（来自同一组的）出售。如果产品类型1的产品和来自不同组的类型2的产品在同一交易中出售，我们将完全忽略该交易。

因此，类型1或类型2的每个产品必须属于同一组。

这是我的查找表：
> P_Lookup Group ProductID1 ProductID2 Group1 A 1 Group1 B 2 Group1 B 3 Group2 C 4 Group2 C 5 Group2 C 6 Group3 D 7 Group3 C 8 Group3 C 9 Group4 E 10 Group4 F 11 Group4 G 12 Group5 H 13 Group5 H 14 Group5 H 15
例如，我不会产品G和产品15在一个交易中，因为它们属于不同的组。

以下是交易：
TransactionID ProductID ProductType a1 A 1 a1 B 1 a1 1 2 a2 C 1 a2 4 2 a2 5 2 a3 D 1 a3 C 1 a3 7 2 a3 8 2 a4 H 1 a5 1 2 a5 2 2 a5 3 2 a5 3 2 a5 1 2 a6 H 1 a6 15 2
我的代码

现在，我可以使用 dplyr 用于从一个组中更新列表交易。但是，我不知道我如何向所有组的代码向量化。

这是我的代码：
P_Groups< -unique（P_Lookup $ Group） Chosen_Group< -P_Groups [5] P_Group_Ind < - P_Trans％ >％ group_by（TransactionID）％>％ dplyr :: filter（（ProductID％in％unique（P_Lookup [P_Lookup $ Group == Chosen_Group，] $ ProductID1））| （ProductID％in％unique（P_Lookup [P_Lookup $ Group == Chosen_Group，] $ ProductID2）））％>％ mutate（No_of_PIDs = n_distinct（ProductType））％>％ mutate（Group_Name = Chosen_Group） P_Group_Ind< -P_Group_Ind [P_Group_Ind $ No_of_PIDs> 1，]
只要手动选择每个组，即通过设置 Chosen_Group ，这样做很好。但是，我不知道如何自动化这个。一种方式，我在想的是使用循环，但是我知道R的美丽是矢量化的，所以我想远离使用for循环。

我真诚地感谢任何帮助。我花了差不多两天的时间。我查看了在r 中使用dplyr for循环，但似乎这个线程正在谈论一个不同的问题。

DATA：
这里是 dput for P_Trans ：
structure（list（TransactionID = c（a1，a1，a1，a2，a2， a2，a3，a3，a3 ，a3，a4，a5，a5，a5，a5，a5， a6，a6），ProductID = B，1，C，4，5，D， C，7，8，H，1 ，3，3，1，H，15），ProductType = c（1， 1，2，1，2，2，1，1，2，2 ，1，2，2，2，2，1，2）），.Names = c（TransactionID， ProductID，ProductType），row.names = c（NA，18L ），class =data.frame）
这是 dput for P_Lookup ：
structure（list（Group = c（Group1，Group1，Group1，Group2， Group2，Group2，Group3 ，Group3，Group3，Group4，Group4， Group4，Group5，Group5，Group5），ProductID1 = c（A， B，B，C，C，C，D，C，C，E，F，G，H ， H），ProductID2 = c（1，2，3，4，5，6，7，8，9，10，11，12，13， 14，15）），.Names = c（Group，ProductID1，ProductID2），row.names = c（NA， 15L），class =data.frame）

这里是 dput（）将查询表中不存在的产品添加到P_Trans中：
结构（list（TransactionID = c a1，a1，a1，a2，a2， a2，a3，a3，a3，a3，a4 a5，a5，a5，a5， a6，a6，a7），ProductID = c（A，B ，C，4，5， D，C，7，8，H，1，2，3 3，1，H，15， 22），ProductType = c（1,1,2,1,2,2,1,1,2,2,1， 2，2， 2，2，2，1，2，3）），.Names = c（TransactionID，ProductID， ProductType），row.names = c NA， 19L），class =data.frame）

解决方案
以下是我希望能够帮助的一个整洁（dplyr，tidyr和purrr）解决方案。

请注意，使用 map_df 将所有结果作为数据框返回。如果您希望将其作为每个组的列表对象，则只需使用映射。
$保存唯一的组以供以后使用 P_Groups< - 库（tidyr）库唯一（P_Lookup $ Group）＃将查找表转换为产品ID和组 P_Lookup< - P_Lookup％>％ gather（ProductIDn，ProductID，ProductID1，ProductID2）％>％ select（ProductID，Group）％>％ distinct（）％>％ nest（-ProductID，.key = Group）＃将组信息绑定到交易＃和下一个分析组 P_Trans< - P_Trans％>％ left_join（P_Lookup）％>％过滤器（！map_lgl （Group，is.null））％>％ unnest（Group）％>％ group_by（TransactionID）＃迭代通过组生成结果 map（P_Groups，〜filter（P_Trans，Group ==。））％>％ map（〜mutate（。，No_of_PIDs = n_distinct（ProductType）））％>％ map_df（〜，（No，no_of_PIDs> 1））＃>资料来源：本地资料框[12 x 5] ＃>组：TransactionID [4] ＃> ＃> TransactionID ProductID ProductType Group No_of_PIDs ＃> < CHR> < CHR> < DBL> < CHR> < INT> ＃> 1 a1 A 1 Group1 2 ＃> 2 a1 B 1 Group1 2 ＃> 3 a1 1 2 Group1 2 ＃> 4 a2 C 1 Group2 2 ＃> 5 a2 4 2 Group2 2 ＃> 6 a2 5 2 Group2 2 ＃> 7 a3 D 1 Group3 2 ＃> 8 a3 C 1 Group3 2 ＃> 9 a3 7 2 Group3 2 ＃> 10 a3 8 2 Group3 2 ＃> 11 a6 H 1 Group5 2 ＃> 12 a6 15 2 Group5 2

I have two data.frames--one look-up table that tells me a set products included in a group. Each group has at least one product of Type 1 and Type 2.

The second data.frame tells me details about the transaction. Each transaction can have one of the following products:

a) Only products of Type 1 from one of the groups

b) Only products of Type 2 from one of the groups

c) Product of Type 1 and Type 2 from the same group

For my analysis, I am interested in finding out c) above i.e. how many transactions have products of Type 1 and Type 2 (from the same group) sold. We will ignore the transaction altogether if Product of Type 1 and that of Type 2 from different groups that are sold in the same transaction.

Thus, each product of Type 1 or Type 2 MUST belong to the same group.

Here's my look up table:
> P_Lookup Group ProductID1 ProductID2 Group1 A 1 Group1 B 2 Group1 B 3 Group2 C 4 Group2 C 5 Group2 C 6 Group3 D 7 Group3 C 8 Group3 C 9 Group4 E 10 Group4 F 11 Group4 G 12 Group5 H 13 Group5 H 14 Group5 H 15
For instance, I won't have Product G and Product 15 in one transaction because they belong to different group.

Here are the transactions:
TransactionID ProductID ProductType a1 A 1 a1 B 1 a1 1 2 a2 C 1 a2 4 2 a2 5 2 a3 D 1 a3 C 1 a3 7 2 a3 8 2 a4 H 1 a5 1 2 a5 2 2 a5 3 2 a5 3 2 a5 1 2 a6 H 1 a6 15 2
My Code:

Now, I was able to write code using dplyr for shortlisting transactions from one group. However, I am not sure how I can vectorize my code for all groups.

Here's my code:
P_Groups<-unique(P_Lookup$Group) Chosen_Group<-P_Groups[5] P_Group_Ind <- P_Trans %>% group_by(TransactionID)%>% dplyr::filter((ProductID %in% unique(P_Lookup[P_Lookup$Group==Chosen_Group,]$ProductID1)) | (ProductID %in% unique(P_Lookup[P_Lookup$Group==Chosen_Group,]$ProductID2)) ) %>% mutate(No_of_PIDs = n_distinct(ProductType)) %>% mutate(Group_Name = Chosen_Group) P_Group_Ind<-P_Group_Ind[P_Group_Ind$No_of_PIDs>1,]
This works well as long as I manually select each group i.e. by setting Chosen_Group. However, I am not sure how I can automate this. One way, I am thinking is to use for loop, but I know that the beauty of R is vectorization, so I want to stay away from using for loop.

I'd sincerely appreciate any help. I have spent almost two days on this. I looked at using dplyr in for loop in r, but it seems this thread is talking about a different issue.

DATA: Here's dput for P_Trans:
structure(list(TransactionID = c("a1", "a1", "a1", "a2", "a2", "a2", "a3", "a3", "a3", "a3", "a4", "a5", "a5", "a5", "a5", "a5", "a6", "a6"), ProductID = c("A", "B", "1", "C", "4", "5", "D", "C", "7", "8", "H", "1", "2", "3", "3", "1", "H", "15"), ProductType = c(1, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2)), .Names = c("TransactionID", "ProductID", "ProductType"), row.names = c(NA, 18L), class = "data.frame")
Here's dput for P_Lookup:
structure(list(Group = c("Group1", "Group1", "Group1", "Group2", "Group2", "Group2", "Group3", "Group3", "Group3", "Group4", "Group4", "Group4", "Group5", "Group5", "Group5"), ProductID1 = c("A", "B", "B", "C", "C", "C", "D", "C", "C", "E", "F", "G", "H", "H", "H"), ProductID2 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)), .Names = c("Group", "ProductID1", "ProductID2"), row.names = c(NA, 15L), class = "data.frame")

Here's the dput() after adding a product to P_Trans that doesn't exist in the look-up table:
structure(list(TransactionID = c("a1", "a1", "a1", "a2", "a2", "a2", "a3", "a3", "a3", "a3", "a4", "a5", "a5", "a5", "a5", "a5", "a6", "a6", "a7"), ProductID = c("A", "B", "1", "C", "4", "5", "D", "C", "7", "8", "H", "1", "2", "3", "3", "1", "H", "15", "22"), ProductType = c(1, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 3)), .Names = c("TransactionID", "ProductID", "ProductType"), row.names = c(NA, 19L), class = "data.frame")

解决方案
Below is a tidyverse (dplyr, tidyr, and purrr) solution that I hope will help.

Note that the use of map_df in the last line returns all results as a data frame. If you'd prefer it to be a list object for each group, then simply use map.
library(dplyr) library(tidyr) library(purrr) # Save unique groups for later use P_Groups <- unique(P_Lookup$Group) # Convert lookup table to product IDs and Groups P_Lookup <- P_Lookup %>% gather(ProductIDn, ProductID, ProductID1, ProductID2) %>% select(ProductID, Group) %>% distinct() %>% nest(-ProductID, .key = Group) # Bind Group information to transactions # and group for next analysis P_Trans <- P_Trans %>% left_join(P_Lookup) %>% filter(!map_lgl(Group, is.null)) %>% unnest(Group) %>% group_by(TransactionID) # Iterate through Groups to produce results map(P_Groups, ~ filter(P_Trans, Group == .)) %>% map(~ mutate(., No_of_PIDs = n_distinct(ProductType))) %>% map_df(~ filter(., No_of_PIDs > 1)) #> Source: local data frame [12 x 5] #> Groups: TransactionID [4] #> #> TransactionID ProductID ProductType Group No_of_PIDs #> <chr> <chr> <dbl> <chr> <int> #> 1 a1 A 1 Group1 2 #> 2 a1 B 1 Group1 2 #> 3 a1 1 2 Group1 2 #> 4 a2 C 1 Group2 2 #> 5 a2 4 2 Group2 2 #> 6 a2 5 2 Group2 2 #> 7 a3 D 1 Group3 2 #> 8 a3 C 1 Group3 2 #> 9 a3 7 2 Group3 2 #> 10 a3 8 2 Group3 2 #> 11 a6 H 1 Group5 2 #> 12 a6 15 2 Group5 2

这篇关于使用dplyr将一个data.frame的输出管道输出到另一个的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用dplyr将一个data.frame的输出管道输出到另一个 [英] Pipe output of one data.frame to another using dplyr

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

使用dplyr将一个data.frame的输出管道输出到另一个 [英] Pipe output of one data.frame to another using dplyr

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭