根据另一个数据框/列表在数据框中的列子集 [英] subset a column in data frame based on another data frame/list
问题描述
我有以下table1
,它是由6列和8083行组成的数据帧.在下面,我显示此table1
的标题:
I have the following table1
which is a data frame composed of 6 columns and 8083 rows. Below I am displaying the head of this table1
:
|gene ID | prom_65| prom_66| amast_69| amast_70| p_value|
|:--------------|---------:|---------:|---------:|---------:|---------:|
|LdBPK_321470.1 | 24.7361| 25.2550| 31.2974| 45.4209| 0.2997430|
|LdBPK_251900.1 | 107.3580| 112.9870| 77.4182| 86.3211| 0.0367792|
|LdBPK_331430.1 | 72.0639| 86.1486| 68.5747| 77.8383| 0.2469355|
|LdBPK_100640.1 | 43.8766| 53.4004| 34.0255| 38.4038| 0.1299948|
|LdBPK_330360.1 | 2382.8700| 1871.9300| 2013.4200| 2482.0600| 0.8466225|
|LdBPK_090870.1 | 49.6488| 53.7134| 59.1175| 66.0931| 0.0843242|
我有另一个数据框,称为accessions40
,它是510个基因ID的列表.它是table1
的第一列的子集,即它的所有值(510)都包含在table1
的第一列(8083)中. accessions40
的标题显示在下面:
I have another data frame, called accessions40
which is a list of 510 gene IDs. It is a subset of the first column of table1
i.e. all of its values (510) are contained in the first column of table1
(8083). The head of accessions40
is displayed below:
|V1 |
|:--------------|
|LdBPK_330360.1 |
|LdBPK_283000.1 |
|LdBPK_360210.1 |
|LdBPK_261550.1 |
|LdBPK_367320.1 |
|LdBPK_361420.1 |
我想要做的是:我想生成一个新的table2
,它在第一列(基因ID)下仅包含accessions40
中存在的值以及table1
.换句话说,我想基于accessions40
的值对table1
的第一列进行子集化.
What I want to do is the following: I want to produce a new table2
which contains under the first column (gene ID) only the values present in accessions40
and the corresponding values from the other five columns from table1
. In other words, I want to subset the first column of my table1
based on the values of accessions40
.
推荐答案
我们可以使用%in%
来获取逻辑向量,并基于此来获取'table1'的行.
We can use %in%
to get a logical vector and subset
the rows of the 'table1' based on that.
subset(table1, gene_ID %in% accessions40$V1)
更好的选择是data.table
library(data.table)
setDT(table1)[gene_ID %chin% accessions40$V1]
或使用dplyr
library(dplyr)
table1 %>%
filter(gene_ID %in% accessions40$V1)
这篇关于根据另一个数据框/列表在数据框中的列子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!