如何重组R中的数据帧 [英] How to reorganize a dataframe in R
问题描述
我使用 read.table()
将CSV文件导入到 data.frame
中。 data.frame
看起来像:
X1 X2 X3
样本A
批量新
名称卷%
数据0.1 10
数据0.2 20
数据0.3 30
样本B
许多旧
名称卷%
数据0.1 50
数据0.2 60
数据0.3 70
我想重组这个 data.frame
,使前3个数据点与Sample'A'和Lot'new'相关联,而最后三个变为与样本'B'和批'旧'相关联。我试图想出一个优雅的方式来做这个没有诉诸使用for循环,或者不得不手动雕刻出 data.frame
row-by- row using subset命令(即 dataA = mydataframe [4:6]
,)。
我想要的 data.frame
可能看起来像这样:
A_new_Vol A_new_%B_old_Vol B_old_%
0.1 10 0.1 50
0.2 20 0.2 60
0.3 30 0.3 70
其中Sample,Lot,Vol和%信息合并到列名称中。
另一种可能性是让 data.frame
成为:
样品批次Vol%
A新0.1 10
A新0.2 20
A新0.3 30
B旧0.1 50
B old 0.2 60
B old 0.3 70
任何指针都将不胜感激。感谢!
假设您的资料位于 df
:
df < - setNames(df [-1,],c(type,Vol,%))
df.lst< - split(df,cumsum(df [,1] ==Sample))
do.call(
rbind,
lapply(df.lst ,函数(x)cbind(Sample = x [1,2],Lot = x [2,2],x [ - (1:3),-1]))
)
生成(在结尾处可以使用 dput
):
样本量Vol%
1.5 A new 0.1 10
1.6 A new 0.2 20
1.7 A新0.3 30
2.11 B旧0.1 50
2.12 B旧0.2 60
2.13 B旧0.3 70
如果你想要你的替代格式,这里是一个选项与 reshape2
:
library(reshape2)
df.new $ id2 < - ave(1:nrow(df.new),df.new $ Sample,df.new $ Lot,FUN = seq_along)
dcast(
melt(df.new,id.vars = c(Sample,Lot,id2)),
id2〜Sample + Lot +
)
产生:
id2 A_new_Vol A_new_%B_old_Vol B_old_%
1 1 0.1 10 0.1 50
2 2 0.2 20 0.2 60
3 3 0.3 30 0.3 70
基本上,你需要添加一个id列,再融化一次,所以你真的在长格式,然后 dcast
以宽格式。
或者如果你想要base R,由Ananda提供):
df.new< - within(df.new,{
pre>
ID< ave(rep(1,nrow(df.new)),Sample,FUN = seq_along)
Time < - paste(Sample,Lot,sep =_)
})
reshape(df.new,direction =wide,idvar =ID,timevar =Time,drop = c(Sample,Lot))
导致:
ID Vol.A_new%.A_new vol.B_old%.B_old
1.4 1 0.1 10 0.1 50
1.5 2 0.2 20 0.2 60
1.6 3 0.3 30 0.3 70
df.new
结构(list(Sample = structure(c(1L,1L,1L,2L,2L,2L),.Label = c new,old),class =new,B),class =factor,Lot = structure(c(1L,1L,1L,2L,2L,2L)因子),Vol = c(0.1,0.2,0.3,0.1,0.2,0.3),%= c(10L,20L,30L,50L,60L,70L),id2 = c(1L,2L,3L, 1L,2L,3L)),.Names = c(Sample,Lot,Vol,%,id2),row.names = c(1.5,1.6 ,2.11,2.12,2.13),class =data.frame)
I import a CSV file into a
data.frame
usingread.table()
. Thedata.frame
looks something like:X1 X2 X3 Sample A Lot new Name Vol % Data 0.1 10 Data 0.2 20 Data 0.3 30 Sample B Lot old Name Vol % Data 0.1 50 Data 0.2 60 Data 0.3 70
I would like to reorganize this
data.frame
such that the first 3 data points are associated with Sample 'A' and Lot 'new', while the last three become associated with Sample 'B' and Lot 'old' instead. I'm trying to think of an elegant way to do this without resorting to the use of for-loop, or having to manually carve out thedata.frame
row-by-row using subset command (i.e.dataA = mydataframe[4:6]
,).The
data.frame
that I want in the end might look something like:A_new_Vol A_new_% B_old_Vol B_old_% 0.1 10 0.1 50 0.2 20 0.2 60 0.3 30 0.3 70
where Sample, Lot, Vol, and % information are incorporated into the column names themselves.
Another possibility is to have the
data.frame
be something like:Sample Lot Vol % A new 0.1 10 A new 0.2 20 A new 0.3 30 B old 0.1 50 B old 0.2 60 B old 0.3 70
Any pointers will be greatly appreciated. Thanks!
解决方案Assuming your data is in
df
:df <- setNames(df[-1, ], c("type", "Vol", "%")) df.lst <- split(df, cumsum(df[, 1] == "Sample")) do.call( rbind, lapply(df.lst, function(x) cbind(Sample=x[1, 2], Lot=x[2, 2], x[-(1:3), -1])) )
Produces (this is available as
dput
at the end):Sample Lot Vol % 1.5 A new 0.1 10 1.6 A new 0.2 20 1.7 A new 0.3 30 2.11 B old 0.1 50 2.12 B old 0.2 60 2.13 B old 0.3 70
If you want your alternate format, here is an option with
reshape2
:library(reshape2) df.new$id2 <- ave(1:nrow(df.new), df.new$Sample, df.new$Lot, FUN=seq_along) dcast( melt(df.new, id.vars=c("Sample", "Lot", "id2")), id2 ~ Sample + Lot + variable )
Produces:
id2 A_new_Vol A_new_% B_old_Vol B_old_% 1 1 0.1 10 0.1 50 2 2 0.2 20 0.2 60 3 3 0.3 30 0.3 70
Basically, you need to add an id column, melt down one more time so you're truly in "long" format, and then
dcast
to wide format.Or if you want base R you can do the same with (contributed by Ananda):
df.new <- within(df.new, { ID <- ave(rep(1, nrow(df.new)), Sample, FUN = seq_along) Time <- paste(Sample, Lot, sep = "_") }) reshape(df.new, direction = "wide", idvar="ID", timevar="Time", drop=c("Sample", "Lot"))
Leads to:
ID Vol.A_new %.A_new Vol.B_old %.B_old 1.4 1 0.1 10 0.1 50 1.5 2 0.2 20 0.2 60 1.6 3 0.3 30 0.3 70
df.new
starts off as:structure(list(Sample = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), Lot = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("new", "old"), class = "factor"), Vol = c(0.1, 0.2, 0.3, 0.1, 0.2, 0.3), "%" = c(10L, 20L, 30L, 50L, 60L, 70L), id2 = c(1L, 2L, 3L, 1L, 2L, 3L)), .Names = c("Sample", "Lot", "Vol", "%", "id2"), row.names = c("1.5", "1.6", "1.7", "2.11", "2.12", "2.13"), class = "data.frame")
这篇关于如何重组R中的数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!