基于每列中的观察数的子集数据帧 [英] Subset dataframe based on number of observations in each column

查看：76 发布时间：2020/10/16 21:36:17 r dataframe subset

本文介绍了基于每列中的观察数的子集数据帧的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个问题要不要帮我。我试图提出解决方案，但我不知道如何解决。

I have one problem would you like to give me a hand. I tried to come up with solution, but I do not have any idea how to work it out.

请使用它来重新创建数据框。

Please use this to recreate my dataframe.

structure(list(A1 = c(87L, 67L, 80L, 36L, 71L, 6L, 26L, 15L, 
14L, 46L, 19L, 93L, 5L, 94L), A2 = c(50L, NA, 73L, 58L, 47L, 
74L, 39L, NA, NA, NA, NA, NA, NA, NA), A3 = c(NA, 38L, 10L, 41L, 
NA, 66L, NA, 7L, 29L, NA, 70L, 23L, 46L, 55L)), .Names = c("A1", 
"A2", "A3"), class = "data.frame", row.names = c(NA, -14L))

我有这个数据框：

在每列具有等于或大于7个观察值（计数）的情况下，如何对数据帧进行切片？
因此，所需的输出如下所示（我们观察到的结果为每列> = 7）：

What is the way to slice dataframe where we have greater or equal of 7 observations(count) per columns? So, the desired output look like this (we have obervation >= 7 per column):

我欢迎任何可以推广到更多列的解决方案。

I welcome any solution that can generalize to more columns.

推荐答案

尝试

df1[, colSums(!is.na(df1)) >= 7]
#   A1 A3
#1  87 NA
#2  67 38
#3  80 10
#4  36 41
#5  71 NA
#6   6 66
#7  26 NA
#8  15  7
#9  14 29
#10 46 NA
#11 19 70
#12 93 23
#13  5 46
#14 94 55

逐步

首先需要确定哪些数据值不丢失。

What you need to do first is to find out which values of your data are not missing.

!is.na(df1)

这将返回一个逻辑矩阵

#        A1    A2    A3
# [1,] TRUE  TRUE FALSE
# [2,] TRUE FALSE  TRUE
# [3,] TRUE  TRUE  TRUE
# [4,] TRUE  TRUE  TRUE
# [5,] TRUE  TRUE FALSE
# [6,] TRUE  TRUE  TRUE
# [7,] TRUE  TRUE FALSE
# [8,] TRUE FALSE  TRUE
# [9,] TRUE FALSE  TRUE
#[10,] TRUE FALSE FALSE
#[11,] TRUE FALSE  TRUE
#[12,] TRUE FALSE  TRUE
#[13,] TRUE FALSE  TRUE
#[14,] TRUE FALSE  TRUE

使用 colSums 找出每列有多少个观测值不丢失

Use colSums to find out how many observations per column are not missing

colSums(!is.na(df1))
#A1 A2 A3 
#14  6 10

应用条件为每列大于或等于7个观察值（计数）

Apply you condition "greater or equal of 7 observations(count) per columns"

colSums(!is.na(df1)) >= 7
#   A1    A2    A3 
# TRUE FALSE  TRUE

最后，您需要使用此向量对数据进行子集

Finally, you need to use this vector to subset your data

df1[, colSums(!is.na(df1)) >= 7]

如果需要定期将其转换为函数

Turn this into a function if you need it regulary

almost_complete_cols <- function(data, min_obs) {
  data[, colSums(!is.na(data)) >= min_obs, drop = FALSE]
}

almost_complete_cols(df1, 7)

这篇关于基于每列中的观察数的子集数据帧的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

基于每列中的观察数的子集数据帧 [英] Subset dataframe based on number of observations in each column

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

基于每列中的观察数的子集数据帧 [英] Subset dataframe based on number of observations in each column

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭