将篮子转换为单一 [英] Convert basket to single
问题描述
我目前有一个以篮子格式设置的表格,因此不规则的数据量与表格的每一行相关联。如:
01,item1,item2,item3
02,item1,item2,
03, item1,item2,item3,item4
04,item1
但我需要更改到规范化的事务格式,每行只有一个项目。如:
01,item1
01,item2
01,item3
02, item1
02,item2
03,item1
... 。是否有一个简单的自动或程序化的方式来做到这一点?数据目前在MySQL数据库中,我可以导出各种文件类型,我还可以访问RStudio和Microsoft Excel尝试这样做。我可以找到RStudio的所有事务资源假设数据已经是第二种格式,这是我想得到的。
我假设我理解你的数据集看起来一旦你读它到R,即,它将是一个矩形数据框,其中NA的填充,使行的长度相同。所以这应该解决问题:
#创建你的数据集(这一步不适合你)
row1 = c (01,item1,item2,item3,NA)
row2 = c(02,item1,item2,NA,NA)
row3 = c (03,item1,item2,item3,item4)
row4 = c(04,item1,NA,NA,NA)
$ b b Data = rbind(row1,row2,row3,row4)
#现在做重构(这一步适合你)
col1 = NULL
col2 = NULL
for(i in 1:nrow(Data)){
col1 = c(col1,rep(Data [i],ncol(Data)-1))
col2 = c col2,Data [i,-1])$ b $ b}
NewData = cbind(col1,col2)[!is.na(col2),]
所以,你得到的是以下
>数据
[,1] [,2] [,3] [,4] [,5]
row101item1item2item3NA
row202 item1item2NA NA
row303item1item2item3item4
row404item1NA NA NA
>
> NewData
col1 col2
[1,]01item1
[2,]01item2
[3,]01item3 b $ b [4,]02item1
[5,]02item2
[6, 03item2
[8,]03item3
[9,]03item4
[10,]04item1
希望有帮助。
I currently have a table set up in a basket format so that an irregular amount of data is associated with each row of the table. Such as:
01,item1,item2,item3 02,item1,item2, 03,item1,item2,item3,item4 04,item1
However, I need to change it to a normalized transactional format with only one item on each row. Such as:
01,item1 01,item2 01,item3 02,item1 02,item2 03,item1
...and so on. Is there an easy automated or programmatic way to do this? The data is currently in a MySQL database that I can export in a variety of file types, and I also have access to RStudio, and Microsoft Excel to try to do this. All the transactional resources I could find for RStudio assume that the data was already in the second format, which is what I'm trying to get to.
解决方案I am assuming I understand the way your data set will look once you read it into R, i.e., it will be a rectangular data frame where NA's are filled in to make the rows the same length. So this should solve the problem:
#Create your dataset (this step is not for you) row1 = c("01","item1","item2","item3",NA) row2 = c("02","item1","item2",NA,NA) row3 = c("03","item1","item2","item3","item4") row4 = c("04","item1",NA,NA,NA) Data = rbind(row1,row2,row3,row4) #Now do the reconstruction (this step is for you) col1 = NULL col2 = NULL for(i in 1:nrow(Data)){ col1 = c(col1,rep(Data[i],ncol(Data)-1)) col2 = c(col2,Data[i,-1]) } NewData = cbind(col1,col2)[!is.na(col2),]
So, what you get is the following
> Data [,1] [,2] [,3] [,4] [,5] row1 "01" "item1" "item2" "item3" NA row2 "02" "item1" "item2" NA NA row3 "03" "item1" "item2" "item3" "item4" row4 "04" "item1" NA NA NA > > NewData col1 col2 [1,] "01" "item1" [2,] "01" "item2" [3,] "01" "item3" [4,] "02" "item1" [5,] "02" "item2" [6,] "03" "item1" [7,] "03" "item2" [8,] "03" "item3" [9,] "03" "item4" [10,] "04" "item1"
So hopefully that helps.
这篇关于将篮子转换为单一的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!