子集数据表,仅将大于特定值的元素应用于所有列 [英] subset data.table keeping only elements greater than certain value applied to all columns

查看:134
本文介绍了子集数据表,仅将大于特定值的元素应用于所有列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想对 news (如下)进行子集创建 news2 (以下更进一步),其中将仅包含每个元素中abs(值)的行/列新闻> 0.01。

I would like to subset news (below) to create news2 (further below) which will only include the rows/columns where the abs(value) in each element of news > 0.01.

下面是我尝试过的代码:

Below is the code that I have tried:

gr <- data.frame(which(abs(news[, 1:ncol(news), with = FALSE]) > 0.01,
arr.ind = TRUE))
news2a <- news[gr$row, c(1, gr$col + 1L), with = FALSE]
news2a[, which(duplicated(names(news2a))) := NULL]

上面的代码并不总是有效。注意:在实际数据集中,行和列都更多。

The code above does not always work. Note: In the real data set, there are both more rows and columns.

# news
ID         diff.jan         diff.feb         diff.mar         diff.apr
1:   7 -2.998852570e-13  2.764079712e-13 -3.291735832e-13  0.000000000e+00
2:   8  1.010000000e-01 -3.717073578e-13 -6.575639966e-13 -2.100269646e-13
3:  10  0.000000000e+00 -3.973537519e-13  0.000000000e+00  0.000000000e+00
4:  47  0.000000000e+00  0.000000000e+00  0.000000000e+00 -2.371100404e-13
5:  50  0.000000000e+00 -2.281689276e-13  2.192820401e-13 -1.857449127e-13
6:  79  0.000000000e+00  4.031985405e-13 -3.981825179e-13  0.000000000e+00
7: 202  6.409906781e-13  0.000000000e+00               NA  1.000000000e+01
8: 203  6.359592723e-13  0.000000000e+00  0.000000000e+00  1.100000000e+01
9: 468  2.545310002e-13 -2.426929277e-13 -2.612280890e-13  0.000000000e+00
       diff.may         diff.jun         diff.jul         diff.aug
1:  0.000000000e+00  0.000000000e+00  1.583933835e-13  1.182802403e-13
2:  0.000000000e+00  1.298306616e-13 -8.222315538e-13  9.721908246e-13
3:  0.000000000e+00  0.000000000e+00  0.000000000e+00  4.697083567e-13
4: -1.315189580e-13  6.926635309e-13  1.243841313e-13  0.000000000e+00
5:  0.000000000e+00  0.000000000e+00  0.000000000e+00  2.210000000e-01
6:  0.000000000e+00  0.000000000e+00  5.015727533e-13  0.000000000e+00
7:  0.000000000e+00 -1.073174486e-13  0.000000000e+00  0.000000000e+00
8:  0.000000000e+00  5.697594583e-13  0.000000000e+00  8.891748412e-13
9: -6.365151884e-13  1.595531286e-13  0.000000000e+00 -1.574081330e-13

news <- structure(list(ID = c(7L, 8L, 10L, 47L, 50L, 79L, 202L, 203L, 
468L), diff.jan = c(-2.99885257e-13, 0.101, 0, 0, 0, 0, 6.409906781e-13, 
6.359592723e-13, 2.545310002e-13), diff.feb = c(2.764079712e-13, 
-3.717073578e-13, -3.973537519e-13, 0, -2.281689276e-13, 4.031985405e-13, 
0, 0, -2.426929277e-13), diff.mar = c(-3.291735832e-13, -6.575639966e-13, 
0, 0, 2.192820401e-13, -3.981825179e-13, NA, 0, -2.61228089e-13
), diff.apr = c(0, -2.100269646e-13, 0, -2.371100404e-13, -1.857449127e-13, 
0, 10, 11, 0), diff.may = c(0, 0, 0, -1.31518958e-13, 0, 0, 0, 
0, -6.365151884e-13), diff.jun = c(0, 1.298306616e-13, 0, 6.926635309e-13, 
0, 0, -1.073174486e-13, 5.697594583e-13, 1.595531286e-13),
diff.jul = c(1.583933835e-13, 
-8.222315538e-13, 0, 1.243841313e-13, 0, 5.015727533e-13, 0, 
0, 0), diff.aug = c(1.182802403e-13, 9.721908246e-13, 4.697083567e-13, 
0, 0.221, 0, 0, 8.891748412e-13, -1.57408133e-13)), .Names = c("ID", 
"diff.jan", "diff.feb", "diff.mar", "diff.apr", "diff.may", "diff.jun", 
"diff.jul", "diff.aug"), class = c("data.table", "data.frame"
), row.names = c(NA, -9L))

<根据上面的新闻,我想要实现strong> news2 。

news2 is what I would like to achieve based on news above.

#news2
ID diff.jan diff.apr diff.aug
1:   8    0.101       NA       NA
2:  50       NA       NA    0.221
3: 202       NA       10       NA
4: 203       NA       11       NA

dput(news2)
news2 <- structure(list(ID = c(8L, 50L, 202L, 203L), diff.jan = c(0.101, 
NA, NA, NA), diff.apr = c(NA, NA, 10L, 11L), diff.aug = c(NA, 
0.221, NA, NA)), .Names = c("ID", "diff.jan", "diff.apr", "diff.aug"
), class = c("data.table", "data.frame"), row.names = c(NA, -4L
))

您能否提供有关实现以下目标的代码的建议?

Can you offer suggestions for code that will achieve the desired result?

推荐答案

如果将data.table转换为长格式,这很容易:

If you melt your data.table to long format, this is easy:

library(reshape2)
news1 <- melt(news, id.vars = "ID")

news2 <- news1[abs(value) > 0.01,]
#    ID variable  value
#1:   8 diff.jan  0.101
#2: 202 diff.apr 10.000
#3: 203 diff.apr 11.000
#4:  50 diff.aug  0.221

dcast.data.table(news2, ID ~ variable)
#    ID diff.jan diff.apr diff.aug
#1:   8    0.101       NA       NA
#2:  50       NA       NA    0.221
#3: 202       NA       10       NA
#4: 203       NA       11       NA

我个人不会做最后一步。

Personally, I wouldn't do the last step.

这篇关于子集数据表,仅将大于特定值的元素应用于所有列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆