R中的系统发育学:内部节点的后代末端折叠 [英] Phylogenetics in R: collapsing descendant tips of an internal node

查看:114
本文介绍了R中的系统发育学:内部节点的后代末端折叠的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几千个基因树,我正准备使用Codeml进行分析.下面的树是一个典型的例子.我想要做的是自动折叠似乎重复的提示或节点.例如,节点56的后代一直是针尖26、27、28等,一直到36.现在,除针尖26以外的所有这些似乎都是重复的.我如何才能将它们全部折叠成一个尖端,而仅剩下尖端28和另一个代表尖端的节点作为节点56的后代呢?

I have several thousand gene trees that I am trying to ready for analysis with codeml. The tree below is a typical example. What I want to do is automate the collapsing of tips or nodes that appear to be duplicates. For instance, descendants of node 56 are tips 26, 27, 28 etc all the way to 36. Now all of these other than tip 26 appear to be duplicates. How can I collapse them all into a single tip, leaving just tips 28 and one representative of the other tips as the descendants of node 56?

我知道如何逐个手动进行此操作,但是我正在尝试使过程自动化,以便函数可以识别需要折叠的尖端,然后将其简化为单个代表性尖端.到目前为止,我一直在研究计算尖端之间距离的共形函数.但是,我不确定如何使用这些信息来折叠提示.

I know how to manually do this one by one, but I am trying to automate the process so that a function can identify which tips need to be collapsed and then reduce them to a single representative tip. So far I have been looking at the cophenetic function which calculates the distances between the tips. However, I am not sure how to use that information to collapse tips.

这是下面树的newick字符串:

Here is the newick string for the below tree:

((((11:0.00201426,12:5e-08,(9:1e-08,10:1e-08,8:1e-08)40:0.00403036)41:0.00099978,7:5e-08)42:0.01717066,(3:0.00191517,(4:0.00196859,(5:1e-08,6:1e-08)71:0.00205168)70:0.00112995)69:0.01796015)43:0.042592645,((1:0.00136179,2:0.00267375)44:0.05586907,(((13:0.00093161,14:0.00532243)47:0.01252989,((15:1e-08,16:1e-08)49:0.00123243,(17:0.00272478,(18:0.00085725,19:0.00113572)51:0.01307761)50:0.00847373)48:0.01103656)46:0.00843782,((20:0.0020268,(21:0.00099593,22:1e-08)54:0.00099081)53:0.00297097,(23:0.00200672,(25:1e-08,(36:1e-08,37:1e-08,35:1e-08,34:1e-08,33:1e-08,32:1e-08,31:1e-08,30:1e-08,29:1e-08,28:0.00099682,27:1e-08,26:1e-08)58:0.00200056,24:1e-08)56:0.00100953)55:0.00210137)52:0.01233888)45:0.01906982)73:0.003562205)38;

推荐答案

一种选择是删除长度在阈值以下的提示.

One option is to drop tips that have a length beneath the threshold.

drop_dupes <- function(tree,thres=1e-5){
  tips <- which(tree$edge[,2] %in% 1:Ntip(tree))
  toDrop <- tree$edge.length[tips] < thres
  drop.tip(tree,tree$tip.label[toDrop])
}

plot(drop_dupes(tree))

这篇关于R中的系统发育学:内部节点的后代末端折叠的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆