使用作为（column_name = value）列表给出的条件从R数据帧中提取项目 [英] Extracting items from an R data frame using criteria given as a (column_name = value) list

查看：157 发布时间：2017/7/13 22:24:40 r dplyr code-readability

本文介绍了使用作为（column_name = value）列表给出的条件从R数据帧中提取项目的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想根据与其他列中的值相关的条件从数据框中的列中提取项目。 这些标准以列名与值相关联的列表的形式给出。
最终目标是使用这些项目在另一个数据结构中按名称选择列。

下面是一个示例数据框架：

 > experiment_plan 
 lib基因型治疗复制
 1 A WT正常1 
 2 B WT hot 1 
 3 C mut normal 1 
 4 D mut hot 1 
 5 E WT正常2 
 6 F WT hot 2 
 7 G mut normal 2 
 8 H mut hot 2

我的选择条件编码为以下列表：

 > ref_condition = list（genotype =WT，treatment =normal）

我想提取lib列中的项目与 ref_condition 匹配，即A和E。

1）我可以在我的选择标准列表中使用名称使用列进行选择：

 >实验_plan [，名称（ref_condition）] 
基因型治疗
 1 WT正常
 2 WT热
 3 mut正常
 4 mut热
 5 WT正常
 6 WT hot 
 7 mut normal 
 8 mut hot

2 ）我可以测试结果行是否符合我的选择条件：

 > test_plan [，names（ref_condition）] == ref_condition 
基因型治疗
 [1，] TRUE TRUE 
 [2，] TRUE FALSE 
 [3，] FALSE TRUE 
 [4，] FALSE FALSE 
 [5，] TRUE TRUE 
 [6，] TRUE FALSE 
 [7，] FALSE TRUE 
 [8，] FALSE FALSE 
> selection_vector<  -  apply（experimental_plan [，names（ref_condition）] == ref_condition，1，all）
> selection_vector 
 [1] TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE

（我认为这一步，应用不是特别优雅，必须有更好的方法。）

3）这个布尔向量可用于选择相关行：

 > selected_lines<  -  experimental_plan [selection_vector] 
> selected_lines 
 lib基因型治疗复制
 1 A WT正常1 
 5 E WT正常2

4）从这一点上，我知道如何使用 dplyr 选择我感兴趣的项目：

 > lib1<  -  filter（selected_lines，replicate ==1）％>％select（lib）％>％unlist（）
> lib2<  -  filter（selected_lines，replicate ==2）％>％select（lib）％>％unlist（）
> lib1 
 lib 
 A 
级别：A B C D E F G H 
> lib2 
 lib 
 E 
级别：ABCDEFGH

可以在以前的步骤中使用 dplyr （或其他聪明的技巧）

5）这些项目恰好对应于另一个数据结构中的列名称（这里名为 counting_data ）。我使用它们来提取相应的列，并将它们放在列表中，与复制数字相关联：

 > count_1<  -  counting_data [，lib1] 
> count_2<  -  counting_data [，lib2] 
> list_of_counts<  -  list（1<  -  count_1，2<  -  counting_2）

counters_data 的数据。 p>

有没有办法更加优雅/高效地完成整个过程？

解决方案

我想你可以用一个key使用data.table。

  library（data。表）
 test<  -  data.table（lib = LETTERS [1：8]，
 genotype = rep（c（WT，WT，mut，mut 2），
 treatment = rep（c（normal，hot），4），
 replicate = c（rep（1,4），rep（2,4）））
 setkeyv（test，c（genotype，treatment） ）
 ref_condition = list（genotype =WT，treatment =normal）
 test [ref_condition，lib]

这给了

[1]AE

您当然可以使用lapply循环测试条件列表。

I would like to extract items from a column in a data frame based on criteria pertaining to values in other columns. These criteria are given in the form of a list associating column names with values. The ultimate goal is to use those items to select columns by name in another data structure.

Here is an example data frame:

> experimental_plan
  lib genotype treatment replicate
1   A       WT    normal         1
2   B       WT       hot         1
3   C      mut    normal         1
4   D      mut       hot         1
5   E       WT    normal         2
6   F       WT       hot         2
7   G      mut    normal         2
8   H      mut       hot         2

And my selection criteria are encoded as the following list:

> ref_condition = list(genotype="WT", treatment="normal")

I want to extract the items in the "lib" column where the line matches ref_condition, that is "A" and "E".

1) I can get the columns to use for selection using names on my list of selection criteria:

> experimental_plan[, names(ref_condition)]
  genotype treatment
1       WT    normal
2       WT       hot
3      mut    normal
4      mut       hot
5       WT    normal
6       WT       hot
7      mut    normal
8      mut       hot

2) I can test whether the resulting lines match my selection criteria:

> experimental_plan[, names(ref_condition)] == ref_condition
     genotype treatment
[1,]     TRUE      TRUE
[2,]     TRUE     FALSE
[3,]    FALSE      TRUE
[4,]    FALSE     FALSE
[5,]     TRUE      TRUE
[6,]     TRUE     FALSE
[7,]    FALSE      TRUE
[8,]    FALSE     FALSE
> selection_vector <- apply(experimental_plan[, names(ref_condition)] == ref_condition, 1, all)
> selection_vector
[1]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE

(I think this step, with the apply is not particularly elegant. There must be a better way.)

3) This boolean vector can be used to select the relevant lines:

> selected_lines <- experimental_plan[selection_vector,]
> selected_lines
  lib genotype treatment replicate
1   A       WT    normal         1
5   E       WT    normal         2

4) From this point on, I know how to use dplyr to select items I'm interested in:

> lib1 <- filter(selected_lines, replicate=="1") %>% select(lib) %>% unlist()
> lib2 <- filter(selected_lines, replicate=="2") %>% select(lib) %>% unlist()
> lib1
lib 
  A 
Levels: A B C D E F G H
> lib2
lib 
  E 
Levels: A B C D E F G H

Can dplyr (or other clever techniques) be used in earlier steps?

5) These items happen to correspond to column names in another data structure (named counts_data here). I use them to extract the corresponding columns and put them in a list, associated with replicate numbers as names:

> counts_1 <- counts_data[, lib1]
> counts_2 <- counts_data[, lib2]
> list_of_counts <- list("1" <- counts_1, "2" <- counts_2)

(Ideally, I would like to generalize the code so that I do not need to know (I mean, "hard-code them") what different values exist in the "replicate" column: there could be any number of replicates for a given combination of "genotype" and "treatment" characteristics, and I want my final list to contain the data from the counts_data pertaining to the corresponding "lib" items.)

Is there a way to do the whole process more elegantly / efficiently?

解决方案

I think you can use data.table for this with a key

library(data.table)
test <- data.table(lib = LETTERS[1:8],
           genotype = rep(c("WT","WT","mut","mut"),2),
           treatment = rep(c("normal","hot"),4),
           replicate = c(rep(1,4),rep(2,4)))
setkeyv(test,c("genotype","treatment"))
ref_condition = list(genotype="WT", treatment="normal")
test[ref_condition,lib]

This gives

[1] "A" "E"

You could of course use lapply to loop over a list of test conditions.

这篇关于使用作为（column_name = value）列表给出的条件从R数据帧中提取项目的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用作为（column_name = value）列表给出的条件从R数据帧中提取项目 [英] Extracting items from an R data frame using criteria given as a (column_name = value) list

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

使用作为（column_name = value）列表给出的条件从R数据帧中提取项目 [英] Extracting items from an R data frame using criteria given as a (column_name = value) list

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭