读取和计数连续点 [英] Reading and counting of consecutive points
问题描述
我在从data.table的2D空间中读取坐标,并从中读取不同的质量时遇到问题:
I have problems reading coordinates from a 2D space from a data.table as the following and reading out different qualities from it:
DT <- data.table(
A = c(rep("aa",2),rep("bb",2)),
B = c(rep("H",2),rep("Na",2)),
Low = c(0,3,1,1),
High = c(8,10,9,8),
Time =c("0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10"),
Intensity = c("0,0,0,0,561464,0,0,0,0,0,0","0,0,0,6548,5464,5616,0,0,0,68716,0","5658,12,6548,6541,8,5646854,54565,56465,546,65,0","0,561464,0,0,0,0,0,0,0,0,0")
)
时间和强度列是指2D空间的x和y值。 低和高列是指x轴上的边界(时间)。
现在,我想检查(<>)那些边界中y(强度)维度的不同质量:
The "Time" and "Intensity" columns are referring to x and y values of a 2D space. The "Low" and "High" columns are referring to to boundaries on the x-axis ("Time"). Now I would like to check different qualities of the y ("Intensity") dimension within (< >) those boarders:
- 连续点数最多的> 0:(行1:1,行2:2 ..)
- 总点数> 0 :(行1:1,行2: 3 ..)
- 连续点的最大数量>基线(基线值应取自下边界或下边界的强度值中的较低者(因此对于第3行,对于其他的0,它将是12)):(第3行:4,对于所有其他行,它与1中的相同)。
所以输出应该是这样的表:
So the output should be a table like that:
DT <- data.table(
A =c(rep("aa",2),rep("bb",2)),
B =c(rep("H",2),rep("Na",2)),
Low = c(0,3,1,1),
High = c(8,10,9,8),
Time = c("0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10","0,1,2,3,4,5,6,7,8,9,10"),
Intensity = c("0,0,0,0,561464,0,0,0,0,0,0","0,0,0,6548,5464,5616,0,0,0,68716,0","5658,12,6548,6541,8,5646854,54565,56465,546,65,0","0,561464,0,0,0,0,0,0,0,0,0"),
First = c(1,2,7,0),
Second= c(1,3,7,0),
Third = c(1,2,4,0)
)
有人知道如何处理该任务吗?到目前为止,我一直在尝试使用data.table,但如果有人知道用于此类任务的更好的程序包,我也会很高兴。
Has anyone an idea how that task could be handled? I was trying with data.table until now but if someone knows a better package for such tasks I would also be happy.
在此先多谢!!
Yasel
推荐答案
这是使用 base的一种方法R
。我们将
的强度,时间列按,
拆分为列表
,然后遍历列表
的相应元素以及高,低列的元素,提取强度中的值根据从低到高的索引,检查它是否大于0(还基于低中值的条件检查)。使用 rle
查找大于0(或低索引)的连续元素的长度
。创建 data.frame
, rbind
list
的内容和 cbind
与原始数据集
Here is one method with base R
. We split
the 'Intensity', 'Time' columns by ,
into a list
, then loop through the corresponding elements of the list
along with the elements of 'High', 'Low' column, extract the values in the 'Intensity' based on the index from 'Low' to 'High', check whether it is greater than 0 (and also based on the conditional checking of values in 'Low'). Use rle
to find the length
of consecutive elements that are greater than 0 (or the 'Low' index). Create a data.frame
, rbind
the contents of list
and cbind
with the original dataset
newCols <- do.call(rbind, Map(function(u, v, x, y) {
u1 <- as.numeric(u)
v1 <- as.numeric(v)
v2 <- as.numeric(v1[u1 >x & u1 < y])
i1 <- with(rle(v2 > 0), pmax(max(lengths[values]), 0))
i2 <- sum(v2 > 0)
lb <- match(x, u1)
ub <- match(y, u1)
v3 <- as.numeric(v[(lb+1):(ub-1)])
i3 = with(rle(v3 > min(as.numeric(v[c(lb, ub)]))),
pmax(max(lengths[values]), 0))
data.frame(First = i1, Second = i2, Third = i3)
},
strsplit(DT$Time, ","), strsplit(DT$Intensity, ","), DT$Low, DT$High))
cbind(DT, newCols)
# A B Low High Time Intensity First Second Third
#1: aa H 0 8 0,1,2,3,4,5,6,7,8,9,10 0,0,0,0,561464,0,0,0,0,0,0 1 1 1
#2: aa H 3 10 0,1,2,3,4,5,6,7,8,9,10 0,0,0,6548,5464,5616,0,0,0,68716,0 2 3 2
#3: bb Na 1 9 0,1,2,3,4,5,6,7,8,9,10 5658,12,6548,6541,8,5646854,54565,56465,546,65,0 7 7 4
#4: bb Na 1 8 0,1,2,3,4,5,6,7,8,9,10 0,561464,0,0,0,0,0,0,0,0,0 0 0 0
这篇关于读取和计数连续点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!