如何在一个数据框中获取一个新的列,它只在R中多出现一个元素 [英] How to get a new column in a data frame which has only elements which appear in the set more than once in R

查看:122
本文介绍了如何在一个数据框中获取一个新的列,它只在R中多出现一个元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

数据:

  DB1<  -  data.frame(orderItemID = c(1,2,3,4,5 ,6,7,8,9,10),
orderDate = c(1.1.12,1.1.12,1.1.12,1.1.12,1.1.12 1.1.12,1.1.12,1.1.12,2.1.12,2.1.12),
itemID = c(2,3,2,5,12,4 ,2,3,1,5),
size = factor(c(l,s,x1,xs,m,s,l ,xxs,xxl)),
color = factor(c(blue,black,blue,orange,red,navy 紫色,白色,黑色)),
customerID = c(33,15,1,33,14,55,33,78,94,23))

预期输出:

  selection_order = c(yes,no,no,no,no,no,yes,no,no,no)

在数据集中,我具有相同大小或相同颜色的项,相同的ItemID。每个注册用户都有其唯一的customerID。



我想确定用户何时使用相同的itemID(不同大小或颜色=具有 customerID = 33 的用户以两种不同的颜色订购相同的项目( ItemID = 2 ),并将其标记在一个名为选择顺序(例如)的新列中,带有是或否。当他或她订购具有其他身份证件的物品时,不应显示是。我只想要得到一个是,当有一个订单(在同一天或过去),相同的ID多一次 - 不管其他的ID(其他产品)。



我已经尝试了很多,但没有任何效果。有几千个不同的userID和ItemId的 - 所以我不能为每个Id的子集。我尝试使用重复的功能 - 但它不会导致令人满意的解决方案:



问题是,如果同一个人订购更多,那么一个对象(customerID被重复然后)和另一个人(customerId)命令具有相同Id(itemId被重复)的项目,它给我一个是:在这种情况下它必须是否。 (在示例中,重复的函数将在orderItemID 4中给出是,而不是否)

解决方案

想想我现在明白你想要的输出是什么,尝试

  library(data.table)
setDT(DB1)[ ,selection_order:= .N> 1,by = list(customerID,itemID)]
DB1
#orderItemID orderDate itemID size color customerID selection_order
#1:1 1.1.12 2 l blue 33 TRUE
#2 :2 1.1.12 3 s black 15 FALSE
#3:3 1.1.12 2 xl blue 1 FALSE
#4:4 1.1.12 5 xs orange 33 FALSE
#5:5 1.1.12 12 m红14 FALSE
#6:6 1.1.12 4 s海军55 FALSE
#7:7 1.1.12 2 l red 33 TRUE
#8:8 1.1。 12 3 m紫色78 FALSE
#9:9 2.1.12 1 xxs white 94 FALSE
#10:10 2.1.12 5 xxl black 23 FALSE
/ pre>

为了转换回 data.frame 使用 DB1 < - as.data.frame(DB1)(对于较旧版本)或 setDF(DB1) data.table 版本。






你可以做到(较低效率)与基础R也

  transform(DB1,selection_order = ave(itemID,list(customerID,itemID),FUN = function(x)length(x)> 1))






或使用 dplyr

 库(dplyr)
DB1%>%
group_by(customerID,itemID)%>%
mutate(selection_order = n()> 1)


Data:

DB1 <- data.frame(orderItemID  = c(1,2,3,4,5,6,7,8,9,10), 
orderDate = c("1.1.12","1.1.12","1.1.12","1.1.12","1.1.12", "1.1.12","1.1.12","1.1.12","2.1.12","2.1.12"),  
itemID = c(2,3,2,5,12,4,2,3,1,5),  
size = factor(c("l", "s", "xl", "xs","m", "s", "l", "m", "xxs", "xxl")), 
color = factor(c("blue", "black", "blue", "orange", "red", "navy", "red", "purple", "white", "black")),  
customerID = c(33, 15, 1, 33, 14, 55, 33, 78, 94, 23))

Expected output:

selection_order = c("yes","no","no","no","no","no","yes","no","no","no")

In the data set I have items with the same size or the same color, the same ItemID. Every registered user has his unique customerID.

I want to identify when a user orders products (more then one) with the same itemID (in different sizes or colors = for example the user with the customerID = 33 orders the same item (ItemID = 2) in two different colors) and mark it in a new column named like "selection order"(for example) with "Yes" or "No". It should NOT show me a "Yes", when he or she orders an item with an other ID. I just want to get a "yes", when there is an order (at the same day or in the past) with the same ID more then once - regardless from other ID´s (other products).

I've tried a lot already,but nothing works. There are a few thousand different userID's and ItemId's-so I can´t subset for every Id. I tried it with the duplicated function - but it's not leading to a satisfactory solution:

The problem is, that if the same person orders more then one object (customerID is duplicated then) and another person(customerId) orders an item with the same Id (itemId is duplicated then) it gives me a "yes": and it must be a "No" in this case. (in the example the duplicate function will give me an "yes" at orderItemID 4 instead of an "no")

解决方案

I think I understand what is your desired output now, try

library(data.table)
setDT(DB1)[, selection_order := .N > 1, by = list(customerID, itemID)]
DB1
#     orderItemID orderDate itemID size  color customerID selection_order
#  1:           1    1.1.12      2    l   blue         33            TRUE
#  2:           2    1.1.12      3    s  black         15           FALSE
#  3:           3    1.1.12      2   xl   blue          1           FALSE
#  4:           4    1.1.12      5   xs orange         33           FALSE
#  5:           5    1.1.12     12    m    red         14           FALSE
#  6:           6    1.1.12      4    s   navy         55           FALSE
#  7:           7    1.1.12      2    l    red         33            TRUE
#  8:           8    1.1.12      3    m purple         78           FALSE
#  9:           9    2.1.12      1  xxs  white         94           FALSE
# 10:          10    2.1.12      5  xxl  black         23           FALSE

In order to convert back to a data.frame, use DB1 <- as.data.frame(DB1) (for older versions) or setDF(DB1) for the lates data.table version.


You can do it (less efficiently) with base R too

transform(DB1, selection_order = ave(itemID, list(customerID, itemID), FUN = function(x) length(x) > 1))


Or using the dplyr package

library(dplyr)
DB1 %>%
  group_by(customerID, itemID) %>%
  mutate(selection_order = n() > 1)

这篇关于如何在一个数据框中获取一个新的列,它只在R中多出现一个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆