如何基于涉及字段的条件提取数据帧的子集? [英] how to extract a subset of a data frame based on a condition involving a field?
问题描述
我有一个大型CSV,其中包含来自不同地点的医疗调查结果(该位置是数据中存在的因素)。由于一些分析是特定于一个位置,为了方便,我想提取子帧,只有从这些位置的行。它发生的位置是第一个字段,所以是的,我可以通过排序CSV行,但我想学习如何做到R,因为我确定我需要这个为其他列。
因此,简而言之,问题是:给定一个数据框foo,如何创建另一个数据框,其中只包含foo的行foo $ location ='there'?
非常感谢。
是两种主要方法。我更喜欢这个可读性:
bar < - subset(foo,location ==there)$ b $请注意,您可以使用&
将许多条件字符串连接在一起。 和 |
来创建复杂的子集。
第二种是索引方法。您可以使用数字或布尔切片来索引R中的行。 foo $ location ==there
返回向量 T
和 F
值的长度与 foo
的行长度相同。您可以这样做,只返回条件返回true的行。
foo [foo $ location ==there
I have a large CSV with the results of a medical survey from different locations (the location is a factor present in the data). As some analyses are specific to a location and for convenience, I'd like to extract subframes with the rows only from those locations. It happens that the location is the very first field so yes, I could do it by sorting the CSV rows, but I'd like to learn how to do it in R as I'm sure I'll need this for other columns.
So, in a nutshell, the question is: given a data frame foo, how can I create another data frame bar which only contains the rows from foo where foo$location = 'there'?
Thanks a lot.
解决方案 Here are the two main approaches. I prefer this one for its readability:
bar <- subset(foo, location == "there")
Note that you can string together many conditionals with &
and |
to create complex subsets.
The second is the indexing approach. You can index rows in R with either numeric, or boolean slices. foo$location == "there"
returns a vector of T
and F
values that is the same length as the rows of foo
. You can do this to return only rows where the condition returns true.
foo[foo$location == "there", ]
这篇关于如何基于涉及字段的条件提取数据帧的子集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!