拆分单元格中的数据 [英] Splitting data in a cell
问题描述
我有一个这样的数据集
Code Product
1 A|B
2 A|B|C
3 A|B|C|D|E
我使用 colsplit
函数拆分列 Product
,重复出现。 colsplit
函数的输出如下所示:
When I split the column Product
using colsplit
function, duplication occurs. The output of colsplit
function looks like this:
Code Product.1 Product.2 Product.3 Product.4 Product.5
1 A B A B A
2 A B C A B
3 A B C D E
发生这种情况是因为其中一个单元格有五个元素。有没有办法避免这种重复?
This happens because one of the cells had five elements. Is there any way to avoid this duplication?
谢谢和关心
Jayaram
Thanks and regards Jayaram
推荐答案
更新(2013年10月21日)
下面的概念已经被卷入了一系列名为 concat.split的函数。 *
在我的splitstackshape包中。这是一个非常简单的解决方案,使用 concat.split.multiple
:
Update (21 Oct 2013)
The concepts below have been rolled into a family of functions called concat.split.*
in my "splitstackshape" package. Here is a very straightforward solution using concat.split.multiple
:
library(splitstackshape)
concat.split.multiple(temp, "Product", "|", "long")
# Code time Product
# 1 1 1 A
# 2 2 1 A
# 3 3 1 A
# 4 1 2 B
# 5 2 2 B
# 6 3 2 B
# 7 1 3 <NA>
# 8 2 3 C
# 9 3 3 C
# 10 1 4 <NA>
# 11 2 4 <NA>
# 12 3 4 D
# 13 1 5 <NA>
# 14 2 5 <NA>
# 15 3 5 E
删除long / code>参数,如果你想要广泛的格式,但你的意见表明,最终你想要一个长格式的输出。
Remove the "long"
argument if you want the wide format, but your comments indicated that ultimately you wanted a long format for your output.
您可以使用 strsplit
和 sapply
如下:
# Your data
temp <- structure(list(Code = 1:3, Product = c("A|B", "A|B|C", "A|B|C|D|E"
)), .Names = c("Code", "Product"), class = "data.frame", row.names = c(NA, -3L))
temp1 <- strsplit(temp$Product, "\\|") # Split the product cell
temp1 <- data.frame(Code = temp$Code,
t(sapply(temp1,
function(x) {
temp <- matrix(NA,
nrow = max(sapply(temp1, length)));
temp[1:length(x)] <- x; temp})))
temp1
# Code X1 X2 X3 X4 X5
# 1 1 A B <NA> <NA> <NA>
# 2 2 A B C <NA> <NA>
# 3 3 A B C D E
使用plyr包中的 rbind.fill
,将每行放入单个列后 data.frame
:
Or... use rbind.fill
from the "plyr" package, after making each of your rows into a single column data.frame
:
temp1 <- strsplit(temp$Product, "\\|")
library(plyr)
data.frame(Code = temp$Code,
rbind.fill(lapply(temp1, function(x) data.frame(t(x)))))
# Code X1 X2 X3 X4 X5
# 1 1 A B <NA> <NA> <NA>
# 2 2 A B C <NA> <NA>
# 3 3 A B C D E
或。 ..灵感来自@ DWin的伟大答案此处,重新阅读第二列作为数据。框架
本身。
Or... inspired by @DWin's great answer here, re-read the second column as a data.frame
in itself.
newcols <- max(sapply(strsplit(temp$Product, "\\|"), length))
temp2 <- data.frame(Code = temp$Code,
read.table(text = as.character(temp$Product),
sep="|", fill=TRUE,
col.names=paste("Product", seq(newcols))))
temp2
# Code Product.1 Product.2 Product.3 Product.4 Product.5
# 1 1 A B
# 2 2 A B C
# 3 3 A B C D E
这篇关于拆分单元格中的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!