将具有相同ID的值分组到列中,而不必在R中加总 [英] Group values with identical ID into columns without summerizing them in R

查看:67
本文介绍了将具有相同ID的值分组到列中,而不必在R中加总的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的数据框,但是有更多的蛋白质

I have a dataframe that looks like this, but with a lot more Proteins

Protein      z
  Irak4  -2.46
  Irak4  -0.13
    Itk  -0.49
    Itk   4.22
    Itk  -0.51
    Ras   1.53

为了进行进一步的操作,我需要按照Proteinname将数据分组为这样的列.

For further operations I need the data to be grouped by Proteinname into columns like this.

Irak4    Itk    Ras
-2.46  -0.49   1.53
-0.13   4.22     NA
   NA  -0.51     NA

我尝试了dplyr或reshape等其他软件包,但没有设法将数据转换为所需的格式.

I tried different packages like dplyr or reshape, but did not manage to transform the data into the desired format.

有什么办法可以做到这一点?我认为某些蛋白质缺少的数据点是这里的主要问题.

Is there any way to achieve this? I think the missing datapoints for some Proteins are the main problem here.

我对R很陌生,所以如果我缺少一个明显的解决方案,我深表歉意.

I am quite new to R, so my apologies if I am missing an obvious solution.

推荐答案

以下是tidyverse

library(tidyverse)
DF %>% 
  group_by(Protein) %>% 
  mutate(idx = row_number()) %>% 
  spread(Protein, z) %>% 
  select(-idx)
# A tibble: 3 x 3
#   Irak4   Itk   Ras
#   <dbl> <dbl> <dbl>
#1  -2.46 -0.49  1.53
#2  -0.13  4.22 NA   
#3  NA    -0.51 NA 

spread数据之前,我们需要创建唯一的标识符.

Before we spread the data, we need to create unique identifiers.

base R中,您可以首先使用unstack,这将为您提供包含z列中的值的矢量命名列表.

In base R you could use unstack first which will give you a named list of vectors that contain the values in the z column.

使用lapply遍历该列表,并使用`length<-`函数将向量与NA附加在一起,以得到长度相等的向量列表.然后我们可以呼叫data.frame.

Use lapply to iterate over that list and append the vectors with NAs using the `length<-` function in order to have a list of vectors with equal lengths. Then we can call data.frame.

lst <- unstack(DF, z ~ Protein)
data.frame(lapply(lst, `length<-`, max(lengths(lst))))
#  Irak4   Itk  Ras
#1 -2.46 -0.49 1.53
#2 -0.13  4.22   NA
#3    NA -0.51   NA

数据

DF <- structure(list(Protein = c("Irak4", "Irak4", "Itk", "Itk", "Itk", 
"Ras"), z = c(-2.46, -0.13, -0.49, 4.22, -0.51, 1.53)), .Names = c("Protein", 
"z"), class = "data.frame", row.names = c(NA, -6L))

这篇关于将具有相同ID的值分组到列中,而不必在R中加总的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆