使用两个时间戳对数据进行分箱 [英] Binning Data With Two Timestamps

查看:46
本文介绍了使用两个时间戳对数据进行分箱的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发帖是因为我没有找到与此主题相关的内容.

I'm posting because I have found no content surrounding this topic.

我的目标本质上是生成一个时间分箱图,绘制一些聚合值.例如.通常这会很容易,因为每个值都有一个时间戳,因此可以相对直接地进入 bin.

My goal is essentially to produce a time-binned graph that plots some aggregated value. For Example. Usually this would be a doddle, since there is a single timestamp for each value, making it relatively straight forward to bin.

但是,我的问题在于每个值都有两个时间戳 - 开始和结束.与甘特图类似,这是一个我绘制的数据示例.我基本上想将时间线存在于所述 bin 中的值(平均值)分箱(bin 边界可能是新/旧任务开始/结束的地方).喜欢.

However, my problem lies in having two timestamps for each value - a start and an end. Similar to a gantt chart, here is an example of my plotted data. I essentially want to bin the values (average) for when the timelines exist within said bin (bin boundaries could be where a new/old task starts/ends). Likeso.

我正在寻找 Vega-Lite 中是否支持的基本示例或答案.我当前的工作示例不会对本次讨论产生任何好处.

I'm looking for a basic example or an answer to whether this is even supported, in Vega-Lite. My current working example would yield no benefit to this discussion.

推荐答案

我丢失了旧帐户,但我是发布此内容的人.这是我对我的问题的解决方案.我在这里聚合的值是每个数据点的时间线包含在每个 bin 中的时间总和.

I lost my old account, but I was the person who posted this. Here is my solution to my question. The value I am aggregating here is the sum of times the timelines for each datapoint is contained within each bin.

  1. 首先,您要使用连接聚合来获取数据扩展到的最大和最小时间.您也可以对此进行硬编码.

  1. First you want to use a join aggregate to get the max and min times your data extend to. You could also hardcode this.

  {
     type: joinaggregate
     fields: [
        startTime
        endTime
     ]
     ops: [
        min
        max
     ]
     as: [
        min
        max
     ]
 }

  • 您想为您的垃圾箱找到一个步骤,您可以稍后对其进行硬编码或使用公式并将其写入新字段.

  • You want to find a step for your bins, you can hard code this later or use a formula and write this into a new field.

    您想在数据中创建两个新字段,它们是最大值和最小值之间的序列,另一个是由您的偏移的相同序列.

    You want to create two new fields in your data that is a sequence between the max and min, and the other the same sequence offset by your step.

    {
       type: formula
       expr: sequence(datum.min, datum.max, datum.step)
       as: startBin
    }
    {
       type: formula
       expr: sequence(datum.min + datum.step, datum.max + datum.step, datum.step)
       as: endBin
    }
    

  • 新字段将是数组.因此,如果我们继续使用展平变换,我们将为每个 bin 中的每个数据值获取一行.

  • The new fields will be arrays. So if we go ahead and use a flatten transform we will get a row for each data value in each bin.

     {
       type: flatten
       fields: [
         startBin
         endBin
       ]
     }
    

  • 然后您想要计算您的数据跨越每个特定 bin 的总时间.为此,您需要将开始时间向上舍入到 bin 开始,并将结束时间向下舍入到 bin 结束.然后取开始和结束时间之间的差异.

  • You then want to calculate the total time your data spans across each specific bin. In order to do this you will need to round up the start time to the bin start and round down the end time to the bin end. Then taking the difference between the start and end times.

     {
       type: formula
       expr: if(datum.startTime<datum.startBin, datum.startBin, if(datum.startTime>datum.endBin, datum.endBin, datum.startTime))
       as: startBinTime
     }
     {
       type: formula
       expr: if(datum.endTime<datum.startBin, datum.startBin, if(datum.endTime>datum.endBin, datum.endBin, datum.endTime))
       as: endBinTime
     }
     {
       type: formula
       expr: datum.endBinTime - datum.startBinTime
       as: timeInBin
     }
    

  • 最后,您只需要按 bin 聚合数据并总结这些时间.然后您的数据就可以绘制了.

  • Finally, you just need to aggregate the data by the bins and sum up these times. Then your data is ready to be plotted.

     {
       type: aggregate
       groupby: [
         startBin
         endBin
       ]
       fields: [
         timeInBin
       ]
       ops: [
         sum
       ]
       as: [
         timeInBin
       ]
     }
    

  • 虽然这个解决方案很长,但在数据的转换部分实施起来相对容易.根据我的经验,它运行得很快,并且显示了 Vega 的多功能性.可视化的自由!

    Although this solution is long, it is relatively easily to implement in the transform section of your data. From my experience this runs fast and just displays how versatile Vega can be. Freedom to visualisations!

    这篇关于使用两个时间戳对数据进行分箱的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆