R(规则)将数据框转换为事务并删除NA [英] R (arules) Convert dataframe into transactions and remove NA

查看:147
本文介绍了R(规则)将数据框转换为事务并删除NA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个设置的数据框.我的目的是将数据帧转换为交易数据,以便使用R中的Arules软件包进行市场篮子分析.我确实在网上做了一些有关将数据帧转换为交易数据的研究,例如(将csv转换为规则的交易),但是我得到的结果却有所不同.

i have a set dataframe. My purpose is to convert the dataframe into transactions data in order to do market basket analysis using Arules package in R. I did do some research online regarding conversion of dataframe to transactions data, e.g.(How to prep transaction data into basket for arules) and (Transform csv into transactions for arules), but the result i got was different.

dput(df)

structure(list(Transaction_ID = c("A001", "A002", "A003", "A004", "A005", "A006"), 
Fruits = c(NA, "Apple", "Orange", NA, "Pear", "Grape"), 
Vegetables = c(NA, NA, NA, "Potato", NA, "Yam"), 
Personal = c("ToothP", "ToothP", NA, "ToothB", "ToothB", NA), 
Drink = c("Coff", NA, "Coff", "Milk", "Milk", "Coff"), 
Other = c(NA, NA, NA, NA, "Promo", NA)), 
.Names = c("Transaction_ID", "Fruits", "Vegetables", "Personal", "Drink", "Other"), 
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L))

下面是我的数据框结构

Transaction_ID  Fruits  Vegetables  Personal  Drink  Other
      A001        NA        NA       ToothP   Coff    NA
      A002       Apple      NA       ToothP    NA     NA
      A003      Orange      NA         NA     Coff    NA
      A004        NA      Potato     ToothB   Milk    NA
      A005       Pear       NA       ToothB   Milk   Promo
      A006      Grape      Yam         NA     Coff    NA

每列的类

sapply(df, class)
Transaction_ID         Fruits     Vegetables       Personal          Drink          Other 
"character"    "character"    "character"    "character"    "character"    "character"

将数据框转换为交易数据

Convert dataframe to transaction data

data <- as(split(df[,"Fruits"], df[,"Vegetables"],df[,"Personal"], df[,"Drink"], df[,"Other"]), "transactions")
inspect(data)

我得到的结果

[1] {NA,NA,ToothP,Coff,NA}
[2] {Apple,NA,ToothP,NA,NA}
[3] {Orange,NA,NA,Coff,NA}
[4] {NA,Potato,ToothB,Milk,NA}
[5] {Pear,NA,ToothB,Milk,Promo}
[6] {Grape,Yam,NA,Coff,NA}

交易数据已成功转换,但我想知道是否有任何方法可以删除NA项目?因为如果NA仍保留在交易清单中,则NA会将其视为一项.

The transaction data was successfully converted, but I was wondering is there any way to remove the NA items? since the NA will take consideration as an item if they still remain in the transaction list.

推荐答案

Ogustari是正确的.这是还处理交易ID的完整代码.

Ogustari is right. Here is the complete code that also handles the transaction IDs.

library("arules")
library("dplyr")  ### for dbl_df
df <- structure(list(Transaction_ID = c("A001", "A002", "A003", "A004", "A005", "A006"), 
  Fruits = c(NA, "Apple", "Orange", NA, "Pear", "Grape"), 
  Vegetables = c(NA, NA, NA, "Potato", NA, "Yam"), 
  Personal = c("ToothP", "ToothP", NA, "ToothB", "ToothB", NA), 
  Drink = c("Coff", NA, "Coff", "Milk", "Milk", "Coff"), 
  Other = c(NA, NA, NA, NA, "Promo", NA)), 
  .Names = c("Transaction_ID", "Fruits", "Vegetables", "Personal", "Drink", "Other"), 
  class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L))

### remove transaction IDs
tid <- as.character(df[["Transaction_ID"]])
df <- df[,-1]

### make all columns factors
for(i in 1:ncol(df)) df[[i]] <- as.factor(df[[i]])

trans <- as(df, "transactions")

### set transactionIDs
transactionInfo(trans)[["transactionID"]] <- tid

inspect(trans)

   items                                          transactionID
[1] {Personal=ToothP,Drink=Coff}                   A001         
[2] {Personal=ToothP}                              A002         
[3] {Drink=Coff}                                   A003         
[4] {Vegetables=Potato,Personal=ToothB,Drink=Milk} A004         
[5] {Personal=ToothB,Drink=Milk,Other=Promo}       A005         
[6] {Vegetables=Yam,Drink=Coff}                    A006         

这篇关于R(规则)将数据框转换为事务并删除NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆