根据涉及字段的条件提取数据框的子集 [英] Extract a subset of a dataframe based on a condition involving a field
问题描述
我有一个很大的CSV文件,其中包含来自不同位置的医学调查结果(该位置是数据中存在的一个因素).由于某些分析特定于某个位置并且为了方便起见,我只想从那些位置中提取带有行的子帧.碰巧位置是第一个字段,所以是的,我可以通过对CSV行进行排序来实现,但是我想学习如何在R中执行此操作,因为我确定其他列也需要使用它.
I have a large CSV with the results of a medical survey from different locations (the location is a factor present in the data). As some analyses are specific to a location and for convenience, I'd like to extract subframes with the rows only from those locations. It happens that the location is the very first field so yes, I could do it by sorting the CSV rows, but I'd like to learn how to do it in R as I'm sure I'll need this for other columns.
因此,总而言之,问题是:给定一个数据框foo,我如何创建另一个仅包含foo中foo$location = 'there'
行的行的数据框栏?
So, in a nutshell, the question is: given a data frame foo, how can I create another data frame bar which only contains the rows from foo where foo$location = 'there'
?
推荐答案
以下是两种主要方法.我更喜欢它的可读性:
Here are the two main approaches. I prefer this one for its readability:
bar <- subset(foo, location == "there")
请注意,您可以使用&
和|
将许多条件字符串组合在一起以创建复杂的子集.
Note that you can string together many conditionals with &
and |
to create complex subsets.
第二个是索引方法.您可以使用数字或布尔切片来索引R中的行. foo$location == "there"
返回T
和F
值的向量,该向量的长度与foo
的行相同.您可以执行此操作以仅返回条件返回true的行.
The second is the indexing approach. You can index rows in R with either numeric, or boolean slices. foo$location == "there"
returns a vector of T
and F
values that is the same length as the rows of foo
. You can do this to return only rows where the condition returns true.
foo[foo$location == "there", ]
这篇关于根据涉及字段的条件提取数据框的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!