在R中的其他数据帧的基础上,基于列添加column(Annotate)一个数据帧 [英] Add column(Annotate) one data frame based on column from other data frame in R

查看:30
本文介绍了在R中的其他数据帧的基础上,基于列添加column(Annotate)一个数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

头(腹腔)

chr     Pos             Val
X       129271111       10
X       129271112       10
X       129271113       10
X       129271114       10
X       129271115       10
X       129271116       11
X       129271117       11
X       129271118       11
X       129271119       11
X       129271120       11
X       129271121       11
X       129271122       11
X       129271123       11
X       129271124       11
X       129271125       11
X       129271126       11
X       129271127       11
X       129271128       11
X       129271129       11
X       129271130       11
X       129271131       11
X       129271132       11
X       129271133       11

head(注释)

chr Region  start       end         Gene    status
X   Exon    129271053   129271110   AIFM1   NO
X   Exon    129270618   129270706   AIFM1   NO
X   Exon    129270020   129270160   AIFM1   NO
X   Exon    129267288   129267430   AIFM1   NO
X   Exon    129265650   129265774   AIFM1   NO
X   Exon    129263945   129264141   AIFM1   NO
X   Exon    129263532   129263603   AIFM1   NO
3   Exon    15643358    15643401    BTD NO
3   Exon    15676931    15677195    BTD NO
3   Exon    15683415    15683564    BTD NO

尝试在第一个文件中用第二个位置的开始和结尾之间的位置创建具有相应基因名称的基因名称的新列.

Trying to create a new column with the Gene name in the first file for the positions between start and end of the second position with respective gene names.

covreage$Gene <- ifelse(covreage$chr == annotation$chr & covreage$pos >= annotation$start & covreage$pos <= annotation$end,annotation$Gene,"NA")

问题是第二个文件在范围内具有file1 pos的值,并且chr和position在两个文件中均应匹配.chr可以具有23个不同的值,并且Pos在所有不同的chr值中都具有相似的值.一起将chr和原始位置定位成为唯一元素

The problem is the second file have the value for file1 pos in range and chr and position should match in both files. The chr can have 23 different values and Pos will have similar values in all of the different chr values. Together chr and position the raw become unique element

上面的代码给出了此错误

The above code gives this error

Warning messages:
1: In is.na(e1) | is.na(e2) :
  longer object length is not a multiple of shorter object length
2: In `==.default`(covreage$chr, annotation$chr) :
  longer object length is not a multiple of shorter object length
3: In covreage$pos >= annotation$start :
  longer object length is not a multiple of shorter object length
4: In covreage$pos <= annotation$end :
  longer object length is not a multiple of shorter object length

推荐答案

通过评估诸如 covreage $ pos> =注记$ start 之类的内容,您可以逐行比较两个data.frames,这不是你想要的.您想使用一些R所不知道的分组规则,将第一行的第二行与第二行的第二行进行比较.

By evaluating something like covreage$pos >= annotation$start, you're comparing both data.frames row by row, which is not what you want. You want to compare several rows from the first against one row from the second, using some grouping rule R does not know about.

您仍然会得到一些输出,因为R通常会根据需要尝试回收元素:

You still get some output because R in general tries to recycle elements as needed:

>1:6< c(2,6,6)[1]是是是否是否

>1:5< c(2,6,6)[1]是是是否是警告信息:在1:5中c(2,6,6):较长的对象长度不是较短的对象长度的倍数

在第一种情况下,由于元素被均匀地重用,因此不会打印警告.在第二种情况下,这是不可能的(因为正如R所说,较长的对象长度不是较短的对象长度的倍数),因此出现警告.

In the first case, no warning is printed because elements are evenly reused; in the second case, that is not possible (because as R says, longer object length is not a multiple of shorter object length), so a warning shows up.

即使在您所介绍的情况下回收被认为是错误,R还是允许它,因为在某些情况下它可能有用.

Even though recycling is to be considered an error in the context you presented, R allows it because it may be useful in some situations.

这篇关于在R中的其他数据帧的基础上,基于列添加column(Annotate)一个数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆