R:`split` 保留因素的自然顺序 [英] R: `split` preserving natural order of factors

查看：21 发布时间：2021/12/28 12:09:54 r split

本文介绍了R:`split` 保留因素的自然顺序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

split 将始终按字典顺序对拆分进行排序.在某些情况下，人们宁愿保留自然秩序.人们总是可以实现手动功能，但是否有基本的 R 解决方案可以做到这一点?

split will always order the splits lexicographically. There may be situations where one would rather preserve the natural order. One can always implement a hand-rolled function but is there a base R solution that does this?

输入:

  Date.of.Inclusion Securities.Included Securities.Excluded yearmon
1        2013-04-01          INDUSINDBK             SIEMENS  4 2013
2        2013-04-01                NMDC               WIPRO  4 2013
3        2012-09-28               LUPIN                SAIL  9 2012
4        2012-09-28          ULTRACEMCO                STER  9 2012
5        2012-04-27          ASIANPAINT                RCOM  4 2012
6        2012-04-27          BANKBARODA              RPOWER  4 2012

split 输出:

R> split(nifty.dat, nifty.dat$yearmon)
$`4 2012`
  Date.of.Inclusion Securities.Included Securities.Excluded yearmon
5        2012-04-27          ASIANPAINT                RCOM  4 2012
6        2012-04-27          BANKBARODA              RPOWER  4 2012

$`4 2013`
  Date.of.Inclusion Securities.Included Securities.Excluded yearmon
1        2013-04-01          INDUSINDBK             SIEMENS  4 2013
2        2013-04-01                NMDC               WIPRO  4 2013

$`9 2012`
  Date.of.Inclusion Securities.Included Securities.Excluded yearmon
3        2012-09-28               LUPIN                SAIL  9 2012
4        2012-09-28          ULTRACEMCO                STER  9 2012

请注意，yearmon 已经按照我喜欢的特定顺序进行了排序.这可以被认为是给定的，因为如果这不成立，这个问题就有点错误指定了.

Note that yearmon is already sorted in a particular order I will like. This can be taken as given because the question is slightly mis-specified if this does not hold.

所需的输出:

$`4 2013`
  Date.of.Inclusion Securities.Included Securities.Excluded yearmon
1        2013-04-01          INDUSINDBK             SIEMENS  4 2013
2        2013-04-01                NMDC               WIPRO  4 2013

$`9 2012`
  Date.of.Inclusion Securities.Included Securities.Excluded yearmon
3        2012-09-28               LUPIN                SAIL  9 2012
4        2012-09-28          ULTRACEMCO                STER  9 2012

$`4 2012`
  Date.of.Inclusion Securities.Included Securities.Excluded yearmon
5        2012-04-27          ASIANPAINT                RCOM  4 2012
6        2012-04-27          BANKBARODA              RPOWER  4 2012

<小时>

谢谢.

PS:我知道有更好的方法来创建 yearmon 以保留该顺序，但我正在寻找通用解决方案.

PS: I know there are better ways to create yearmon to preserve that order but I am looking for a generic solution.

但不要使用`split`.使用 `data.table` 代替:

但是通常情况下，随着级别的增加，split 往往非常慢.所以，我建议使用 data.table 子集到一个列表.我想那会快得多！

But do not use `split`. Use `data.table` instead:

However normally, split tends to be terribly slow as the levels increase. So, I'd suggest using data.table to subset to a list. I'd suppose that'd be much faster!

require(data.table)
dt <- data.table(df)
dt[, grp := .GRP, by = yearmon]
setkey(dt, grp)
o2 <- dt[, list(list(.SD)), by = grp]$V1

<小时>

对海量数据进行基准测试:

set.seed(45)
dates <- seq(as.Date("1900-01-01"), as.Date("2013-12-31"), by = "days")
ym <- do.call(paste, c(expand.grid(1:500, 1900:2013), sep="_"))

df <- data.frame(x1 = sample(dates, 1e4, TRUE), 
                 x2 = sample(letters, 1e4, TRUE), 
                 x3 = sample(10, 1e4, TRUE), 
                 yearmon = sample(ym, 1e4, TRUE), 
      stringsAsFactors=FALSE)

require(data.table)
dt <- data.table(df)

f1 <- function(dt) {
    dt[, grp := .GRP, by = yearmon]
    setkey(dt, grp)

    o1 <- dt[, list(list(.SD)), by=grp]$V1
}

f2 <- function(df) {
    df$yearmon <- factor(df$yearmon, levels=unique(df$yearmon))
    o2 <- split(df, df$yearmon)
}

require(microbenchmark)
microbenchmark(o1 <- f1(dt), o2 <- f2(df), times = 10)

# Unit: milliseconds
         expr        min         lq     median        uq      max neval
#  o1 <- f1(dt)   43.72995   43.85035   45.20087  715.1292 1071.976    10
#  o2 <- f2(df) 4485.34205 4916.13633 5210.88376 5763.1667 6912.741    10

请注意，o1 的解决方案将是一个未命名列表.但是您可以简单地通过执行 names(o1) <- unique(dt$yearmon)

Note that the solution from o1 will be an unnamed list. But you can set the names simply by doing names(o1) <- unique(dt$yearmon)

这篇关于R:`split` 保留因素的自然顺序的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R:`split` 保留因素的自然顺序 [英] R: `split` preserving natural order of factors

问题描述

推荐答案

但不要使用`split`.使用 `data.table` 代替:

But do not use `split`. Use `data.table` instead:

对海量数据进行基准测试:

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R:`split` 保留因素的自然顺序 [英] R: `split` preserving natural order of factors

问题描述

推荐答案

但不要使用split.使用 data.table 代替:

But do not use split. Use data.table instead:

对海量数据进行基准测试:

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

但不要使用`split`.使用 `data.table` 代替:

But do not use `split`. Use `data.table` instead:

登录关闭