子集数据帧,使每行中的所有值都小于某个值 [英] Subset dataframe such that all values in each row are less than a certain value

查看:102
本文介绍了子集数据帧,使每行中的所有值都小于某个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中有一个维列和4个值列。如何对列进行子集,以使每个记录的所有4列都小于给定的x?我知道我可以手动使用子集并指定每列的条件,但是有没有办法使用可能的应用功能?
以下是一个示例数据框。例如让我们说x是0.7。在这种情况下,我想删除任何行,该列的任何列超过0.7)。

 拉链ABC DEF GHI JKL 
1 1 0.8 0.6 0.1 0.6
2 2 0.1 0.3 0.8 1.0
3 3 0.5 0.1 0.4 0.8
4 4 0.6 0.4 0.2 0.3
5 5 1.0 0.8 0.6 0.5
6 6 0.2 0.7 0.3 0.4
7 7 0.3 1.0 1.0 0.2
8 8 0.7 0.9 0.5 0.1
9 9 0.9 0.5 0.9 0.7
10 10 0.4 0.2 0.7 0.9

以下函数似乎有效,但有人可以在这里解释逻辑吗?

  Variance_Percentile [!rowSums(Variance_Percentile [-1]> 0.7),] 
拉链ABC DEF GHI JKL
4 4 0.6 0.4 0.2 0.3
6 6 0.2 0.7 0.3 0.4


解决方案

您可以使用否定的 rowSums()进行子集

  df [!rowSums(df [-1]> 0.7),] 
#拉链ABC DEF GHI JKL
#4 4 0.6 0.4 0.2 0.3
#6 6 0.2 0.7 0.3 0.4




  • df [-1 ] 0.7 给了我们一个逻辑矩阵,告诉我们哪些 df [-1] 大于0.7

  • code> rowSums()这些行的总和(每个TRUE值等于1,FALSE为零)

  • 将这些值转换为逻辑值并对它们进行否定,这样我们可以得到任何零(FALSE)的行和,并将它们变为TRUE。换句话说,如果 rowSums()结果为零,我们希望这些行。

  • 我们使用该行的逻辑向量子集



获得相同逻辑向量的另一种方法是执行

  rowSums(df [-1]> 0.7)== 0 


I have a dataframe with a dimension column and 4 value columns. How can I subset the column such that all 4 columns for each record are less than a given x? I know I could do this manually using subset and specifying the condition for each column, but is there a way to do it using maybe an apply function? Below is a sample dataframe. For example let's say the x was 0.7. In that case I would want to eliminate any rows where any column of that row is more than 0.7).

   zips ABC DEF GHI JKL
1     1 0.8 0.6 0.1 0.6
2     2 0.1 0.3 0.8 1.0
3     3 0.5 0.1 0.4 0.8
4     4 0.6 0.4 0.2 0.3
5     5 1.0 0.8 0.6 0.5
6     6 0.2 0.7 0.3 0.4
7     7 0.3 1.0 1.0 0.2
8     8 0.7 0.9 0.5 0.1
9     9 0.9 0.5 0.9 0.7
10   10 0.4 0.2 0.7 0.9

The following function seemed to work, but could someone explain the logic here?

Variance_Percentile[!rowSums(Variance_Percentile[-1] > 0.7), ]
  zips ABC DEF GHI JKL
4    4 0.6 0.4 0.2 0.3
6    6 0.2 0.7 0.3 0.4

解决方案

You can use the negated rowSums() for the subset

df[!rowSums(df[-1] > 0.7), ]
#   zips ABC DEF GHI JKL
# 4    4 0.6 0.4 0.2 0.3
# 6    6 0.2 0.7 0.3 0.4

  • df[-1] > 0.7 gives us a logical matrix telling us which df[-1] are greater than 0.7
  • rowSums() sums across those rows (each TRUE value is equal to 1, FALSE is zero)
  • ! converts those values to logical and negates them, so that we get any row sums which are zero (FALSE) and turn them into TRUE. In other words, if the rowSums() result is zero, we want those rows.
  • we use that logical vector for the row subset

Another way to get the same logical vector would be to do

rowSums(df[-1] > 0.7) == 0

这篇关于子集数据帧,使每行中的所有值都小于某个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆