如何“分组"?订购R中的数据以进行绘制? [英] How to "group" order data in R for a barplot?

查看:68
本文介绍了如何“分组"?订购R中的数据以进行绘制?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做生物信息学,我需要输出一个带有祖先结果的图形(条形图).通常,这些图是通过将总体分组在一起绘制的.这样做的方式是,您只需为不同的推定总体(此处为4)绘制Q得分的条形图(以下数据).

I'm doing bioinformatics and I need to output a graph (barplot) with the ancestry results in it. Usually, these graphs are draw by grouping the populations together. The way it's done is that you simply plot the barplot of the Q score (data below) for the different putative population (here 4).

问题是我使用ord = tbl[order(tbl$V1,tbl$V2,tbl$V3),]对值进行排序.这样一来,我看到一些条形图无法在正确的总体中聚类(请参阅图中应与第一组聚类的橙色条形图).因此,我想知道如何通过颜色(代表总体)对条进行聚类.

The problem is that I use ord = tbl[order(tbl$V1,tbl$V2,tbl$V3),] to sort my values. That way I see that some bars do not cluster in the right population (see the orange bar in the graph that should cluster with the first group). Thus, I would like to know how it would be possible to cluster the bars by colour (which represents the population).

有没有办法解决这个问题?

Is there a way to solve this?

barplotgeno <- function(tbl, # To plot the Q scores (ancestry). 
                        col = c("#FF3030", # nice colors
                                "#9ACD31",
                                "#1D90FF",
                                "#FF8001"), 
                        pdf = TRUE,
                        pdf.path.name = "~/Desktop/Stacked_barplot.pdf") {

  ord = tbl[order(tbl$V1,tbl$V2,tbl$V3),]

  if(pdf) {
    pdf(pdf.path.name, width = 11, height = 8.5)
  bp = barplot(t(as.matrix(ord[,1:dim(ord)[2]-1])), 
               space = c(0.2),#space = c(0),# Space between the bars
               col=col, #rainbow(4),
               xlab="Individual #", 
               ylab="Ancestry", xaxt="n",
               border=NA, main = "Stacked barplot from ADMIXTURE analysis")
  labs <- row.names(ord)
  text(cex=0.5, x=bp+1, y=-0.03, labs, xpd=TRUE, srt=90, pos=2)
    dev.off()
  } else {
    bp = barplot(t(as.matrix(ord[,1:dim(ord)[2]])), 
                 space = c(0.2),
                 col=col, 
                 xlab="Individual #", 
                 ylab="Ancestry", xaxt="n",
                 border=NA, 
                 main = "Stacked barplot from ADMIXTURE analysis")
    labs <- row.names(ord)
    text(cex=0.5, x=bp+1, y=-0.03, labs, xpd=TRUE, srt=90, pos=2)
  }
}

# Example 
barplotgeno(tbl = tbl, pdf = FALSE)

以下是数据:

tbl = structure(list(V1 = c(1e-05, 0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.430202, 0.99997, 
0.801974, 1e-05, 0.99997, 0.99997, 1e-05, 0.999968, 1e-05, 1e-05, 
1.3e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.99997, 1e-05, 1.8e-05, 
1e-05, 1.2e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1.1e-05, 1e-05, 0.642925, 
1e-05, 0.99997, 0.99997, 1e-05, 1e-05, 0.99997, 1e-05, 0.99997, 
0.99997, 0.287976, 1e-05, 0.99997, 0.99997, 0.99997, 1e-05, 0.533994, 
0.99997, 0.99997, 1e-05, 0.99997, 1e-05, 0.99997, 1e-05, 1e-05, 
0.99997, 0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 0.307669, 0.99997, 
0.604114, 0.604792, 0.29646, 0.514252, 0.99997, 0.798616, 0.516577, 
1e-05, 1e-05, 1e-05, 1e-05, 0.449886, 1e-05, 1e-05, 1e-05, 1e-05, 
0.790272, 1e-05, 0.576786, 0.776731), V2 = c(0.99997, 1e-05, 
1e-05, 1e-05, 0.99997, 0.99997, 1e-05, 1e-05, 1e-05, 0.99997, 
0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 
1e-05, 1.2e-05, 0.99997, 1e-05, 1e-05, 1e-05, 0.99997, 1e-05, 
0.99997, 0.99997, 1e-05, 1e-05, 0.528138, 0.99997, 1e-05, 1e-05, 
1e-05, 1e-05, 0.99997, 1e-05, 0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.712004, 0.99997, 
1e-05, 1e-05, 1e-05, 0.99997, 0.465986, 1e-05, 1e-05, 1e-05, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.99997, 0.99997, 
0.99997, 0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.05777, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.591857, 0.99997, 1e-05, 
0.99997, 0.99997, 0.99997, 0.99997, 1e-05, 1e-05, 1e-05, 1e-05
), V3 = c(1e-05, 1e-05, 1e-05, 0.541112, 1e-05, 1e-05, 0.329922, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.198006, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.999967, 0.451508, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.749225, 
1e-05, 0.99997, 0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 0.442211, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.188, 
1e-05, 0.248756, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 
1e-05, 1e-05, 1e-05, 1e-05, 0.395866, 0.395188, 0.293429, 0.427968, 
1e-05, 0.201364, 0.483403, 1e-05, 1e-05, 0.408123, 1e-05, 0.550094, 
1e-05, 1e-05, 1e-05, 1e-05, 0.209708, 0.533729, 0.423194, 0.223249
), V4 = c(1e-05, 1e-05, 0.99997, 0.458868, 1e-05, 1e-05, 0.670058, 
0.99997, 0.99997, 1e-05, 1e-05, 0.99997, 0.99997, 0.569778, 1e-05, 
1e-05, 0.99997, 1e-05, 1e-05, 0.99997, 1e-05, 1e-05, 0.99997, 
1e-05, 0.548472, 1e-05, 0.99997, 1e-05, 1e-05, 1e-05, 0.99997, 
0.471833, 1e-05, 0.250753, 0.99997, 1e-05, 1e-05, 1e-05, 0.999969, 
1e-05, 0.357055, 0.557769, 1e-05, 1e-05, 0.99997, 0.99997, 1e-05, 
0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 
1e-05, 1e-05, 1e-05, 0.81198, 1e-05, 0.751224, 1e-05, 0.99997, 
0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.692311, 
1e-05, 1e-05, 1e-05, 0.410101, 1e-05, 1e-05, 1e-05, 1e-05, 0.99997, 
0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 
0.466251, 1e-05, 1e-05)), .Names = c("V1", "V2", "V3", "V4"), class = "data.frame", row.names = c(NA, 
-93L))

如果需要分组变量,这里为一个(按数据顺序):

If you need a grouping variable here is one (in order of the data):

species = c("fuliginosa", "fuliginosa", "fuliginosa", "fuliginosa", "fuliginosa", 
"fuliginosa", "fuliginosa", "fuliginosa", "fuliginosa", "fortis", 
"fuliginosa", "fuliginosa", "fortis", "fortis", "fortis", "fuliginosa", 
"fuliginosa", "fortis", "scandens", "fortis", "fortis", "scandens", 
"scandens", "fuliginosa", "magnirostris", "scandens", "magnirostris", 
"fortis", "fortis", "fortis", "fortis", "fortis", "scandens", 
"fortis", "scandens", "scandens", "magnirostris", "fortis", "fortis", 
"fortis", "scandens", "magnirostris", "magnirostris", "fortis", 
"fortis", "fortis", "fortis", "magnirostris", "fortis", "magnirostris", 
"magnirostris", "magnirostris", "fortis", "fortis", "fortis", 
"magnirostris", "magnirostris", "fortis", "fortis", "fortis", 
"fortis", "fortis", "fortis", "fortis", "fortis", "fortis", "fortis", 
"magnirostris", "fortis", "fortis", "fortis", "fortis", "fortis", 
"fortis", "fortis", "fortis", "fortis", "fortis", "fuliginosa", 
"fortis", "fortis", "fortis", "fortis", "fuliginosa", "fuliginosa", 
"fuliginosa", "fuliginosa", "fortis", "fortis", "fortis", "fortis", 
"fortis", "fortis")

推荐答案

稍微好一点的方法:

library(tidyverse)

plot_data <- tbl %>% 
  mutate(id = row_number()) %>% 
  gather('pop', 'prob', V1:V4) %>% 
  group_by(id) %>% 
  mutate(likely_assignment = pop[which.max(prob)],
         assingment_prob = max(prob)) %>% 
  arrange(likely_assignment, desc(assingment_prob)) %>% 
  ungroup() %>% 
  mutate(id = forcats::fct_inorder(factor(id)))

具有ggplot的图:

A plot with ggplot:

ggplot(plot_data, aes(id, prob, fill = pop)) +
  geom_col() +
  theme_classic()

或具有多个方面:

ggplot(plot_data, aes(id, prob, fill = pop)) +
  geom_col() +
  facet_grid(~likely_assignment, scales = 'free', space = 'free')

这篇关于如何“分组"?订购R中的数据以进行绘制?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆