读取和计数连续点 [英] Reading and counting of consecutive points

查看:90
本文介绍了读取和计数连续点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在从data.table的2D空间中读取坐标,并从中读取不同的质量时遇到问题:

I have problems reading coordinates from a 2D space from a data.table as the following and reading out different qualities from it:

DT <- data.table(
                                      A = c(rep("aa",2),rep("bb",2)),
                                      B = c(rep("H",2),rep("Na",2)),
                                      Low = c(0,3,1,1),
                                      High = c(8,10,9,8),
                                      Time =c("0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10"),
                                      Intensity = c("0,0,0,0,561464,0,0,0,0,0,0","0,0,0,6548,5464,5616,0,0,0,68716,0","5658,12,6548,6541,8,5646854,54565,56465,546,65,0","0,561464,0,0,0,0,0,0,0,0,0")

                     )

时间和强度列是指2D空间的x和y值。 低和高列是指x轴上的边界(时间)。
现在,我想检查(<>)那些边界中y(强度)维度的不同质量:

The "Time" and "Intensity" columns are referring to x and y values of a 2D space. The "Low" and "High" columns are referring to to boundaries on the x-axis ("Time"). Now I would like to check different qualities of the y ("Intensity") dimension within (< >) those boarders:


  1. 连续点数最多的> 0:(行1:1,行2:2 ..)

  2. 总点数> 0 :(行1:1,行2: 3 ..)

  3. 连续点的最大数量>基线(基线值应取自下边界或下边界的强度值中的较低者(因此对于第3行,对于其他的0,它将是12)):(第3行:4,对于所有其他行,它与1中的相同)。

所以输出应该是这样的表:

So the output should be a table like that:

DT <- data.table(
                              A =c(rep("aa",2),rep("bb",2)),
                              B =c(rep("H",2),rep("Na",2)),
                              Low = c(0,3,1,1),
                              High = c(8,10,9,8),
                              Time = c("0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10"),
                              Intensity = c("0,0,0,0,561464,0,0,0,0,0,0","0,0,0,6548,5464,5616,0,0,0,68716,0","5658,12,6548,6541,8,5646854,54565,56465,546,65,0","0,561464,0,0,0,0,0,0,0,0,0"),
                              First = c(1,2,7,0),
                              Second= c(1,3,7,0),
                              Third = c(1,2,4,0)
                  )

有人知道如何处理该任务吗?到目前为止,我一直在尝试使用data.table,但如果有人知道用于此类任务的更好的程序包,我也会很高兴。

Has anyone an idea how that task could be handled? I was trying with data.table until now but if someone knows a better package for such tasks I would also be happy.

在此先多谢!!

Yasel

推荐答案

这是使用 base的一种方法R 。我们将 的强度,时间列按拆分为列表,然后遍历列表的相应元素以及高,低列的元素,提取强度中的值根据从低到高的索引,检查它是否大于0(还基于低中值的条件检查)。使用 rle 查找大于0(或低索引)的连续元素的长度。创建 data.frame rbind list 的内容和 cbind 与原始数据集

Here is one method with base R. We split the 'Intensity', 'Time' columns by , into a list, then loop through the corresponding elements of the list along with the elements of 'High', 'Low' column, extract the values in the 'Intensity' based on the index from 'Low' to 'High', check whether it is greater than 0 (and also based on the conditional checking of values in 'Low'). Use rle to find the length of consecutive elements that are greater than 0 (or the 'Low' index). Create a data.frame, rbind the contents of list and cbind with the original dataset

newCols <- do.call(rbind, Map(function(u, v, x, y) {
     u1 <- as.numeric(u)
     v1 <- as.numeric(v)
     v2 <- as.numeric(v1[u1 >x & u1 < y])
     i1 <- with(rle(v2 > 0), pmax(max(lengths[values]), 0))
     i2 <- sum(v2 > 0)
     lb <- match(x, u1)
     ub <- match(y, u1)
     v3 <- as.numeric(v[(lb+1):(ub-1)])

     i3 = with(rle(v3 > min(as.numeric(v[c(lb, ub)]))), 
                      pmax(max(lengths[values]), 0))
      data.frame(First = i1, Second = i2, Third = i3)
      },
         strsplit(DT$Time, ","), strsplit(DT$Intensity, ","), DT$Low, DT$High))

cbind(DT, newCols)
#  A  B Low High                   Time                                        Intensity First Second Third
#1: aa  H   0    8 0,1,2,3,4,5,6,7,8,9,10                       0,0,0,0,561464,0,0,0,0,0,0     1      1     1
#2: aa  H   3   10 0,1,2,3,4,5,6,7,8,9,10               0,0,0,6548,5464,5616,0,0,0,68716,0     2      3     2
#3: bb Na   1    9 0,1,2,3,4,5,6,7,8,9,10 5658,12,6548,6541,8,5646854,54565,56465,546,65,0     7      7     4
#4: bb Na   1    8 0,1,2,3,4,5,6,7,8,9,10                       0,561464,0,0,0,0,0,0,0,0,0     0      0     0

这篇关于读取和计数连续点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆