将长格式转换为宽格式 [英] Converting long format to wide format
本文介绍了将长格式转换为宽格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
具有如下所示的拷贝数长格式,其中每个样本在其自己的基因组范围内有其自己的拷贝数值(SegVal)
> head(long)
chromosome start end segVal sample
1: chr1 3218923 116319008 2 TCGA-05-4417-01A-22D-1854-01
2: chr1 116324707 120523902 1 TCGA-05-4417-01A-22D-1854-01
3: chr1 149879545 247812431 4 TCGA-05-4417-01A-22D-1854-01
4: chr1 3218923 104393357 2 TCGA-06-0644-01A-02D-0310-01
5: chr1 104418619 149879545 1 TCGA-06-0644-01A-02D-0310-01
6: chr1 149885583 247812431 2 TCGA-06-0644-01A-02D-0310-01
我如何将其转换为宽格式,以便样本在列中具有它们的值(不过,如果我没有错,基因组范围应该是常见的),如
> head(wide)
chr start end TCGA-05-4417-01A-22D-1854-01 TCGA-06-0644-01A-02D-0310-01 TCGA-06-0644-01A-02D-0310-01
chr1 24254002 24291000 2 2 2
chr3 47421002 49068000 1 0 0
chr4 69204002 70320000 0 0 1
chr5 58263002 59785000 0 1 1
chr6 29010002 33287000 2 2 2
chr7 110240002 111354000 0 0 0
>
推荐答案
这对您有效吗?
library(tidyr)
options(scipen = 999)
df <- structure(list(chromosome = c("chr1", "chr1", "chr1", "chr1",
"chr1", "chr1"), start = c(3218923L, 116324707L, 149879545L,
3218923L, 104418619L, 149885583L), end = c(116319008L, 120523902L,
247812431L, 104393357L, 149879545L, 247812431L), segVal = c(2L,
1L, 4L, 2L, 1L, 2L), sample = c("TCGA-05-4417-01A-22D-1854-01",
"TCGA-05-4417-01A-22D-1854-01", "TCGA-05-4417-01A-22D-1854-01",
"TCGA-06-0644-01A-02D-0310-01", "TCGA-06-0644-01A-02D-0310-01",
"TCGA-06-0644-01A-02D-0310-01")), class = "data.frame", row.names = c("1:",
"2:", "3:", "4:", "5:", "6:"))
df <- df %>%
pivot_wider(names_from = sample, values_from = segVal, values_fill = 0)
#> # A tibble: 6 x 5
#> chromosome start end `TCGA-05-4417-01A-22D-1… `TCGA-06-0644-01A-02D-0…
#> <chr> <int> <int> <int> <int>
#> 1 chr1 3218923 116319008 2 0
#> 2 chr1 116324707 120523902 1 0
#> 3 chr1 149879545 247812431 4 0
#> 4 chr1 3218923 104393357 0 2
#> 5 chr1 104418619 149879545 0 1
#> 6 chr1 149885583 247812431 0 2
由reprex package(v0.3.0)于2020-08-24创建
这篇关于将长格式转换为宽格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文