基于列表中元素的子集数据 [英] Subset Data Based On Elements In List

查看:65
本文介绍了基于列表中元素的子集数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

R 中,我尝试将 data.frame 子集命名为 Data 通过使用存储在列表中的元素

In R, I am trying to subset the data.frame named Data by using element stored in a list.

数据

Data <- read.table(text = "  Data_x  Data_y  Column_X 
                                -34      12       A
                                -36      20       D
                                -36      12       E
                                -34      18       F
                                -34      10       B
                                -35      24       A
                                -35      16       B
                                -33      22       B
                                -33      14       C
                                -35      22       D", header = T)

代码

variableData <- list("A", "B")
subsetData_1 <- subset(Data, Column_X == variableData[1])
subsetData_2 <- subset(Data, Column_X == variableData[2])
subsetData <- rbind(subsetData_1, subsetData_2)

问题


  • 首先,列表中的元素可以大于两个,并且不是固定的。甚至可以包含100个以上的元素。

  • 第二,我只想保留一个 data.frame 来存储所有子集数据使用列表中的所有元素提取。如果还有更多元素,比方说100,那么我不想为每个元素重复 subset()

  • First, the elements in the list can be more than two and is not fixed. Can even have more than 100 elements.
  • Second, I want to keep only one data.frame which will store all the subset data extracted using all the elements in list. If there are more elements, lets say 100, then I don't want to repeat subset() for each of the elements.

有没有比上面的代码更好的方法了?由于我的方法还不够好,因此会影响性能。

Is there a better way to approach this than the code above? As my approach is not good enough and will take performance hit.

任何建议都会有所帮助,谢谢。

Any suggestion will be helpful, thanks.

推荐答案

经典愉快地

x <- lapply(variableData, function(x){subset(Data, Column_X == x)})
x
# [[1]]
# Data_x Data_y Column_X
# 1    -34     12        A
# 6    -35     24        A
# 
# [[2]]
# Data_x Data_y Column_X
# 5    -34     10        B
# 7    -35     16        B
# 8    -33     22        B

它返回所有子集的列表。要 rbind 所有这些列表元素,只需

it returns a list of all the subsets. To rbind all these list elements just

do.call(rbind, x)
#   Data_x Data_y Column_X
# 1    -34     12        A
# 6    -35     24        A
# 5    -34     10        B
# 7    -35     16        B
# 8    -33     22        B

但是,正如@Frank指出的那样,您可以使用代码中的基本子集:

however, as @Frank pointed out, you could use basic subsetting in your code:

Data[Data$Column_X %in% variableData,]
#   Data_x Data_y Column_X
# 1    -34     12        A
# 5    -34     10        B
# 6    -35     24        A
# 7    -35     16        B
# 8    -33     22        B




警告

"Warning

这是一个方便使用的功能,可以交互使用,对于编程,最好使用标准的子集功能,例如 [,尤其是参数子集的非标准评估会产生意想不到的后果。 (?subset

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences." (?subset)

此外,行的顺序为保持。

Furthermore, thus the order of your rows will be kept.

这篇关于基于列表中元素的子集数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆