大量行之间的差异 [英] Difference between large number of rows

查看：129 发布时间：2017/3/12 11:43:39 r data.table

本文介绍了大量行之间的差异的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个矩阵，行数非常大，只有两个成对列。我想计算列1中每行之间的差异，如果差异小于预定义值（.001），则计算两列中的行的平均值。例如，我有一个称为权重的矩阵，

  AB 
 185.0765 10 
 185.3171 20 
 186.0777 30 
 186.0780 40 
 188.0078 50 
 
 weight< -as.data.table（weights）
 bins< -weights [％％ 3]，（A [3] + 001））] 
 meanA <-mean（bins $ A）
 meanB <-mean（bins $ B）
  
 
 
 ，结果矩阵为
  AB 
 185.0765 10 
 185.3171 20 
 186.0779 35 
 188.0078 50 
  
 
 $ b b 
如果有人可以请大家告诉我如何为大量行执行此操作，我将非常感谢。我认为使用for循环将不是很有效率。
解决方案
这应该实现你想做的，使用 data.table ：
  DT<  -  data.table（weights）
 DT [，Group：=（cumsum（c（1，ifelse（diff（weights $ A）<0.001,0,1）））] 
 DT [，lapply（.SD，mean） by = Group，.SDcols = c（A，B）] 
＃Group AB 
＃1：1 185.0765 10 
＃2：2 185.3171 20 
＃ 3：3 186.0779 35 
＃4：4 188.0078 50 
  
累加和以找到具有 A 的差异的 A  0.001。如果差值低于此阈值，我们在 Group 列中放入 0 ，因此在累积和中
 
 
 根据 @eddi 的建议，更简洁，更有效的方式是进行分组和计算所有在同一时间，在一个调用：
  DT < -  data.table（weights）
 DT [，lapply（.SD，mean），by = list（Group = cumsum（c（1，diff（A））> = 0.001）），.SDcols = c（A，B）] 
  
另外，绝对行数也是有帮助的。 非常大的行对于不同的人和用例意味着不同的东西。我们说百万吗？数亿？
 
I have a matrix with very large number of rows and only two paired columns. I want to calculate the differences between each rows in column 1 and if the difference is less than a predefined value(.001) then calculate the average of those rows in both columns. For example I have a matrix called weights,
  A      B
185.0765 10
185.3171 20
186.0777 30
186.0780 40
188.0078 50

weights<-as.data.table(weights)
bins<-weights[A %between% c(A[3],(A[3]+.001))]
meanA<-mean(bins$A)
meanB<-mean(bins$B)
and the resulting matrix will be,
  A      B
185.0765 10
185.3171 20
186.0779 35
188.0078 50
I would be thankful if someone could please advice me how to do this for large number of rows. I think using a for loop would not be very efficient.
 解决方案 
This should achieve what you want to do, using data.table:
DT <- data.table( weights )
DT[ , Group :=( cumsum( c( 1 , ifelse( diff(weights$A) < 0.001 , 0 , 1 ) ) ) ) ]
DT[ , lapply(.SD, mean) , by=Group ,  .SDcols = c("A","B") ]
#   Group        A  B
#1:     1 185.0765 10
#2:     2 185.3171 20
#3:     3 186.0779 35
#4:     4 188.0078 50
The idea is we use a cumulative sum to find the groups of A that have a difference of < 0.001. If the difference is under this threshold we put a 0 in our Group column, so in the cumulative sum it will be part of the same group.

As suggested by @eddi a more succinct and efficient way of doing this would be to do the grouping and the calculation all at the same time, in one call:
DT <- data.table( weights )
DT[ , lapply(.SD, mean) , by = list(Group = cumsum(c(1,diff(A)) >= 0.001)) ,  .SDcols = c("A","B") ]    
As an aside, it is always helpful to have an absolute number of rows. A very large number of rows mean different things to different people and use-cases. Are we talking million? Hundreds of millions?

                        这篇关于大量行之间的差异的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

大量行之间的差异 [英] Difference between large number of rows

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

大量行之间的差异 [英] Difference between large number of rows

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭