如何重组R中的数据帧 [英] How to reorganize a dataframe in R

查看:79
本文介绍了如何重组R中的数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 read.table()将CSV文件导入到 data.frame 中。 data.frame 看起来像:

  X1 X2 X3 
样本A
批量新
名称卷%
数据0.1 10
数据0.2 20
数据0.3 30
样本B
许多旧
名称卷%
数据0.1 50
数据0.2 60
数据0.3 70

我想重组这个 data.frame ,使前3个数据点与Sample'A'和Lot'new'相关联,而最后三个变为与样本'B'和批'旧'相关联。我试图想出一个优雅的方式来做这个没有诉诸使用for循环,或者不得不手动雕刻出 data.frame row-by- row using subset命令(即 dataA = mydataframe [4:6] ,)。



我想要的 data.frame 可能看起来像这样:

  A_new_Vol A_new_%B_old_Vol B_old_%
0.1 10 0.1 50
0.2 20 0.2 60
0.3 30 0.3 70

其中Sample,Lot,Vol和%信息合并到列名称中。



另一种可能性是让 data.frame 成为:

 样品批次Vol%
A新0.1 10
A新0.2 20
A新0.3 30
B旧0.1 50
B old 0.2 60
B old 0.3 70

任何指针都将不胜感激。感谢!

解决方案

假设您的资料位于 df

  df < -  setNames(df [-1,],c(type,Vol,%))
df.lst< - split(df,cumsum(df [,1] ==Sample))
do.call(
rbind,
lapply(df.lst ,函数(x)cbind(Sample = x [1,2],Lot = x [2,2],x [ - (1:3),-1]))

生成(在结尾处可以使用 dput ):

 样本量Vol%
1.5 A new 0.1 10
1.6 A new 0.2 20
1.7 A新0.3 30
2.11 B旧0.1 50
2.12 B旧0.2 60
2.13 B旧0.3 70

如果你想要你的替代格式,这里是一个选项与 reshape2

  library(reshape2)
df.new $ id2 < - ave(1:nrow(df.new),df.new $ Sample,df.new $ Lot,FUN = seq_along)
dcast(
melt(df.new,id.vars = c(Sample,Lot,id2)),
id2〜Sample + Lot +

产生:

  id2 A_new_Vol A_new_%B_old_Vol B_old_%
1 1 0.1 10 0.1 50
2 2 0.2 20 0.2 60
3 3 0.3 30 0.3 70

基本上,你需要添加一个id列,再融化一次,所以你真的在长格式,然后 dcast 以宽格式。



或者如果你想要base R,由Ananda提供):

  df.new<  -  within(df.new,{
ID< ave(rep(1,nrow(df.new)),Sample,FUN = seq_along)
Time < - paste(Sample,Lot,sep =_)
})
reshape(df.new,direction =wide,idvar =ID,timevar =Time,drop = c(Sample,Lot))
pre>

导致:

  ID Vol.A_new%.A_new vol.B_old%.B_old 
1.4 1 0.1 10 0.1 50
1.5 2 0.2 20 0.2 60
1.6 3 0.3 30 0.3 70






df.new

 结构(list(Sample = structure(c(1L,1L,1L,2L,2L,2L),.Label = c new,old),class =new,B),class =factor,Lot = structure(c(1L,1L,1L,2L,2L,2L)因子),Vol = c(0.1,0.2,0.3,0.1,0.2,0.3),%= c(10L,20L,30L,50L,60L,70L),id2 = c(1L,2L,3L, 1L,2L,3L)),.Names = c(Sample,Lot,Vol,%,id2),row.names = c(1.5,1.6 ,2.11,2.12,2.13),class =data.frame)


I import a CSV file into a data.frame using read.table(). The data.frame looks something like:

X1        X2   X3
Sample    A  
Lot      new
Name     Vol   %
Data     0.1   10
Data     0.2   20
Data     0.3   30
Sample    B  
Lot      old
Name     Vol   %
Data     0.1   50
Data     0.2   60
Data     0.3   70

I would like to reorganize this data.frame such that the first 3 data points are associated with Sample 'A' and Lot 'new', while the last three become associated with Sample 'B' and Lot 'old' instead. I'm trying to think of an elegant way to do this without resorting to the use of for-loop, or having to manually carve out the data.frame row-by-row using subset command (i.e. dataA = mydataframe[4:6],).

The data.frame that I want in the end might look something like:

A_new_Vol  A_new_%   B_old_Vol   B_old_%
  0.1        10         0.1        50
  0.2        20         0.2        60
  0.3        30         0.3        70

where Sample, Lot, Vol, and % information are incorporated into the column names themselves.

Another possibility is to have the data.frame be something like:

Sample   Lot   Vol   %
  A      new   0.1   10
  A      new   0.2   20
  A      new   0.3   30
  B      old   0.1   50
  B      old   0.2   60
  B      old   0.3   70

Any pointers will be greatly appreciated. Thanks!

解决方案

Assuming your data is in df:

df <- setNames(df[-1, ], c("type", "Vol", "%"))
df.lst <- split(df, cumsum(df[, 1] == "Sample"))
do.call(
  rbind,
  lapply(df.lst, function(x) cbind(Sample=x[1, 2], Lot=x[2, 2], x[-(1:3), -1]))
)

Produces (this is available as dput at the end):

     Sample Lot Vol  %
1.5       A new 0.1 10
1.6       A new 0.2 20
1.7       A new 0.3 30
2.11      B old 0.1 50
2.12      B old 0.2 60
2.13      B old 0.3 70

If you want your alternate format, here is an option with reshape2:

library(reshape2)
df.new$id2 <- ave(1:nrow(df.new), df.new$Sample, df.new$Lot, FUN=seq_along)
dcast(
  melt(df.new, id.vars=c("Sample", "Lot", "id2")), 
  id2 ~ Sample + Lot + variable
)

Produces:

  id2 A_new_Vol A_new_% B_old_Vol B_old_%
1   1       0.1      10       0.1      50
2   2       0.2      20       0.2      60
3   3       0.3      30       0.3      70

Basically, you need to add an id column, melt down one more time so you're truly in "long" format, and then dcast to wide format.

Or if you want base R you can do the same with (contributed by Ananda):

df.new <- within(df.new, {
  ID <- ave(rep(1, nrow(df.new)), Sample, FUN = seq_along)
  Time <- paste(Sample, Lot, sep = "_")
})
reshape(df.new, direction = "wide", idvar="ID", timevar="Time", drop=c("Sample", "Lot"))

Leads to:

    ID Vol.A_new %.A_new Vol.B_old %.B_old
1.4  1       0.1      10       0.1      50
1.5  2       0.2      20       0.2      60
1.6  3       0.3      30       0.3      70


df.new starts off as:

structure(list(Sample = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), Lot = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("new", "old"), class = "factor"), Vol = c(0.1, 0.2, 0.3, 0.1, 0.2, 0.3), "%" = c(10L, 20L, 30L, 50L, 60L, 70L), id2 = c(1L, 2L, 3L, 1L, 2L, 3L)), .Names = c("Sample", "Lot", "Vol", "%", "id2"), row.names = c("1.5", "1.6", "1.7", "2.11", "2.12", "2.13"), class = "data.frame")

这篇关于如何重组R中的数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆