使用带有$的逻辑向量对数据帧进行子集 [英] Subset a dataframe using a logical vector with $

查看：111 发布时间：2020/10/17 0:16:33 r dataframe subset

本文介绍了使用带有$的逻辑向量对数据帧进行子集的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在子集a中，我无法理解 $ 符号的使用原因和行为 data.frame 。下面的示例在我正在参加的初学者课程中演示（没有在校教授，所以不能在那里提问）：

I'm having trouble understanding both the reason for use and behavior of the $ symbol in subsetting a data.frame in R. The following example was presented in a beginner's class I'm taking (not with a live professor so can't ask there):

temp_mat <- matrix(1:9, nrow=3)
colnames(temp_mat) <- c('a', 'b', 'c')
temp_df <- data.frame(temp_mat)

调用 temp_df 显然会输出：

示例在课程中给出的则为：

The example given in the course is then:

temp_df[temp_df$c < 10]

哪个输出：

使用原因问题：该课程表明 $ 用于部分匹配，并且 x $ y 是 x [[ y，确切= FALSE]] 的精确替代。我们为什么要在这里使用部分匹配运算符？我们使用它是因为我们确定在我们的 temp_df 中没有其他类似 c的列会被错误地选中吗？另外，如何测量部分匹配？至少有百分之几的字符匹配？似乎有一个 getElement 函数，如果使用具有未知或相似列名的数据集（例如，家用电话与手机，将它们视为一个有效的部分匹配？）

Reason for use question: The course indicates that $ is used for partial matching, and that x$y is an exact substitute for x[["y", exact=FALSE]]. Why would we want to use a partial matching operator here? Do we use it because we know for sure that in our temp_df there is no other column similar to "c" that could be mistakenly picked up? Additionally, how is partial match measured? A minimum % of characters matching or something? It appears there is a getElement function that would be much more appropriate if working with datasets with unknown or similar column names (e.g. Home Phone versus Cell Phone, would these be seen as a valid partial match?)

行为问题：：出现上面的示例 temp_df [temp_df $ c< 10] 表示从temp_df返回元素的子集，其中c列小于10，并且由于所有c列元素均符合条件，因此将返回整个数据帧。我的解释显然是错误的，因为 temp_df [temp_df $ c< 9] 返回：

Behavior question: it appears the above example temp_df[temp_df$c < 10] is saying "return the subset of elements from temp_df where column c is less than 10" and because all column c elements meet the criteria, the entire dataframe is returned. My interpretation is obviously wrong because temp_df[temp_df$c < 9] returns:

尽管c列中的第1行和第2行确实符合标准小于9时，整个列将被省略。然后我的问题变成双重的：逻辑向量实际上在说/做什么？以及如何写成从temp_df返回元素的子集（其中c列小于9的列）并返回的解释：

Although the row 1 and 2 elements in column c do meet the criteria of being less than 9, the entire column is omitted. My question then becomes twofold: what is that logical vector actually saying/doing? And how would I write my interpretation of "return the subset of elements from temp_df where column c is less than 9" and have it return:

  a b c
1 1 4 7
2 2 5 8

因为在我看来，元素1和2（行1和2）符合条件，因为它们的列c值小于9，因此应返回。

Because in my mind, elements 1 and 2 (rows 1 and 2) met that criteria as their column c values are less than 9 and thus should be returned.

推荐答案

尝试分步分解操作。

temp_df$c < 9

给出一个向量，如下所示：

gives a vector as follows:

[1]  TRUE  TRUE FALSE

通过此向量时按照您显示的方式：
temp_df [c（TRUE，TRUE，FALSE）] 具有对列进行操作的作用。

When you pass this vector in the manner you have shown: temp_df[c(TRUE, TRUE, FALSE)] has the effect of operating on columns.

以 data.frame 为列表，以列名作为键，列内容为向量价值观。该操作保留TRUE键（即列），并删除FALSE。

Think about a data.frame as a list, with column names as the keys and the column contents as vector values. The operation preserves the TRUE keys (i.e. columns) and drops the FALSE.

逗号用于将向量标记为行索引。前两行将保留，最后一行将被删除。因此， temp_df [c（TRUE，TRUE，FALSE），] 给出：

The comma serves to mark the vector as row index. The first two rows are retained and the last one is dropped. Thus, temp_df[c(TRUE, TRUE, FALSE), ] gives:

  a b c
1 1 4 7
2 2 5 8

这篇关于使用带有$的逻辑向量对数据帧进行子集的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用带有$的逻辑向量对数据帧进行子集 [英] Subset a dataframe using a logical vector with $

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用带有$的逻辑向量对数据帧进行子集 [英] Subset a dataframe using a logical vector with $

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭