从data.table只获取R中的数字列 [英] get from data.table only numeric columns in R

查看：211 发布时间：2017/3/12 12:36:00 r data.table

本文介绍了从data.table只获取R中的数字列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下数据和代码。我想得到所有数字列的意思。当前代码给出警告。如何只选择数字列，然后找到它们的含义：

I have following data and code. I want to get mean of all numeric columns. Current code give warnings. How can I select only numeric columns and then find their means:

> mydt
          vnum1 vint1 vfac1 vch1
 1: -0.30159484     8     3    E
 2: -0.09833430     8     1    D
 3: -2.15963282     1     3    D
 4:  0.03904374     5     2    B
 5:  1.54928970     4     1    C
 6: -0.73873654     5     1    A
 7: -0.68594479     9     2    B
 8:  1.35765612     1     2    E
 9:  1.46958351     2     1    B
10: -0.89623979     2     4    E
> 
> mydt[,lapply(.SD, mean),]
       vnum1 vint1 vfac1 vch1
1: -0.046491   4.5    NA   NA
Warning messages:
1: In mean.default(X[[3L]], ...) :
  argument is not numeric or logical: returning NA
2: In mean.default(X[[4L]], ...) :
  argument is not numeric or logical: returning NA
> 
> 
> dput(mydt)
structure(list(vnum1 = c(-0.301594844692861, -0.0983343040483769, 
-2.15963282153076, 0.03904374068617, 1.54928969700272, -0.738736535236348, 
-0.685944791146016, 1.35765612481877, 1.46958350568506, -0.896239790653183
), vint1 = c(8L, 8L, 1L, 5L, 4L, 5L, 9L, 1L, 2L, 2L), vfac1 = structure(c(3L, 
1L, 3L, 2L, 1L, 1L, 2L, 2L, 1L, 4L), .Label = c("1", "2", "3", 
"4"), class = "factor"), vch1 = structure(c(5L, 4L, 4L, 2L, 3L, 
1L, 2L, 5L, 2L, 5L), .Label = c("A", "B", "C", "D", "E"), class = "factor")), .Names = c("vnum1", 
"vint1", "vfac1", "vch1"), class = c("data.table", "data.frame"
), row.names = c(NA, -10L), .internal.selfref = <pointer: 0x991c070>)

我尝试下面但不工作：

> mydt[,lapply(.SD, is.numeric),]
   vnum1 vint1 vfac1  vch1
1:  TRUE  TRUE FALSE FALSE
> 
> mydt[,mydt[,lapply(.SD, is.numeric),]]
   vnum1 vint1 vfac1  vch1
1:  TRUE  TRUE FALSE FALSE
> 
> mydt[,mydt[,lapply(.SD, is.numeric),], with=F]
Error in Math.data.frame(j) : 
  non-numeric variable in data frame: vnum1vint1vfac1vch1
> mydt[,c(mydt[,lapply(.SD, is.numeric)),], with=F]
Error: unexpected ')' in "mydt[,c(mydt[,lapply(.SD, is.numeric))"
>

根据@Arun建议，我尝试了以下操作但无法获取子集：

As suggested by @Arun, I tried following but cannot get a subset:

> xx = mydt[,lapply(.SD, is.numeric),]
> xx
   vnum1 vint1 vfac1  vch1
1:  TRUE  TRUE FALSE FALSE
> mydt[,lapply(.SD,mean),.SDcols=xx]
Error in `[.data.table`(mydt, , lapply(.SD, mean), .SDcols = xx) : 
  .SDcols should be column numbers or names

正如@David建议的，非数字列的值。我想获得mydt的一个子集，以便其他列甚至不列出。

As suggested by @David, I tried following but get NULL values for non-numeric columns. I want to get a subset of mydt so that other columns are not even listed.

> mydt[, lapply(.SD, function(x) if(is.numeric(x)) mean(x))]
       vnum1 vint1 vfac1 vch1
1: -0.046491   4.5  NULL NULL

我正在使用data.frame：

I am mising data.frame:

> sapply(mydf, is.numeric)
vnum1 vint1 vfac1  vch1 
 TRUE  TRUE FALSE FALSE 
> mydf[sapply(mydf, is.numeric)]
         vnum1 vint1
1  -0.30159484     8
2  -0.09833430     8
3  -2.15963282     1
4   0.03904374     5
5   1.54928970     4
6  -0.73873654     5
7  -0.68594479     9
8   1.35765612     1
9   1.46958351     2
10 -0.89623979     2
> 

> sapply(mydf[sapply(mydf, is.numeric)], mean)
    vnum1     vint1 
-0.046491  4.500000

好的。感谢David的评论，以下作品：

OK. Thanks to David's comment, following works:

mydt[, sapply(mydt, is.numeric), with = FALSE][,sapply(.SD, mean),]
    vnum1     vint1 
-0.046491  4.500000 

> mydt[, sapply(mydt, is.numeric), with = FALSE]
          vnum1 vint1
 1: -0.30159484     8
 2: -0.09833430     8
 3: -2.15963282     1
 4:  0.03904374     5
...

推荐答案

通过在SO上搜索 .SDcols ，我登录了这个答案，我认为解释很好地使用它。

By searching on SO for .SDcols, I landed up on this answer, which I think explains quite nicely how to use it.

cols = sapply(mydt, is.numeric)
cols = names(cols)[cols]
mydt[, lapply(.SD, mean), .SDcols = cols]
#        vnum1 vint1
# 1: -0.046491   4.5

c> mydt [，sapply（mydt，is.numeric），with = FALSE] 不是那么高效，因为它子集你的data.table与那些列，（深）复制 - 更多不必要的内存。

Doing mydt[, sapply(mydt, is.numeric), with = FALSE] is not that efficient because it subsets your data.table with those columns and that makes a (deep) copy - more memory used unnecessarily.

使用 colMeans 将data.table强制转换为 matrix ，这也不是那么高效的内存。

And using colMeans coerces the data.table into a matrix, which again is not so memory efficient.

这篇关于从data.table只获取R中的数字列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从data.table只获取R中的数字列 [英] get from data.table only numeric columns in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从data.table只获取R中的数字列 [英] get from data.table only numeric columns in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭