按染色体名称排序 [英] sort by chromosome name

查看:510
本文介绍了按染色体名称排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个染色体名称向量

I have a vector of chromosome names

q<-c("1","10","11","12","13","14","15","16","17",
     "18","19","20","21","22","2","3","4","5","6",
     "7","8","9","X","Y","M")

我想将它们排序为

q<-c("1","2","3","4","5","6","7","8","9","10","11",
     "12","13","14","15","16","17","18","19","20",
     "21","22","X","Y","M")

我想自己下订单

chrOrder <-c((1:22),"X","Y","M")

并像使用它

factor(cbind(q),levels=chrOrder)

但是我还是不明白.

已编辑..... 我有类似的senario,但是进步了.我有一个由三列组成的数据框:名称,染色体,开始

Edited..... I have similar senario but sligtly advanced. I have a data frame of three columns , name, chromosome, start

df <-data.frame(name =c("a","a","a","b","b","b"), chrom = c(1,2,10,1,3,"X"), start=c(100,200,300,500,300,200))

我需要先按名称分类,然后按染色体和起始分类. 结果应该像

I need to sort it first by name, then chromosome and the start. The result should be like

name chrom start
a     1   100
a     10  300
a     2   200
b     1   500
b     3   300
b     X   200

我不知道如何在下面使用chrOrder:

I dont know how to use chrOrder in following:

indata  <- df[do.call(order,df[,c(name, chrom, start)]),];

推荐答案

您的方法很好.您只需要sort结果因子.您还应该设置ordered=TRUE:

Your approach is good; you just need to sort the resulting factor. You should also set ordered=TRUE:

sort(factor(q,levels=chrOrder, ordered=TRUE))

不,您不必像已经指出的那样使用有序因子,但这肯定没有错-而且可以说是更好的选择.影响因素是这种情况,您具有明确定义的级别.请参阅有关之前的问题,factorcharacter .

No, you don't have to use an ordered factor, as has been pointed out, but it's certainly not wrong--and it's arguably better. Factors are for this type of situation, where you have well-defined levels. See this previous question on on factor vs character.

现在,您已经编辑了问题,因为排序很简单,一个因素的理由就更强了:

Now that you've edited your question, the case for a factor is even stronger because sorting is simple:

df <- data.frame(name=c("a","a","a","b","b","b"),
                 chrom = c(1,2,10,1,3,"X"),
                 start=c(100,200,300,500,300,200))

chrOrder <-c((1:22),"X","Y","M")
df$chrom <- factor(df$chrom, chrOrder, ordered=TRUE)

df[do.call(order, df[, c("name", "chrom", "start")]), ]

鉴于因子的水平,R确切地知道如何对元素进行排序.

Given the levels of the factor, R knows exactly how to sort the elements.

我一直采用排序方法,但是您可能想知道还有更漂亮的方法,例如:

I've followed your lead with the sorting method, but you might like to know that there are prettier ways, e.g.:

library(plyr)
df <- arrange(df, name, chrom, start)

这篇关于按染色体名称排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆