为什么在本例中使用序列比使用列表慢得多 [英] Why is using a sequence so much slower than using a list in this example

查看:57
本文介绍了为什么在本例中使用序列比使用列表慢得多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景: 我有一系列连续的,带时间戳的数据. 数据序列中有漏洞,有的很大,有的只是一个缺失值.
每当孔仅是一个缺失值时,我都想使用虚拟值修补孔(较大的孔将被忽略).

Background: I have a sequence of contiguous, time-stamped data. The data-sequence has holes in it, some large, others just a single missing value.
Whenever the hole is just a single missing value, I want to patch the holes using a dummy-value (larger holes will be ignored).

我想使用延迟生成的修补序列,因此使用了Seq.unfold.

I would like to use lazy generation of the patched sequence, and I am thus using Seq.unfold.

我制作了两种方法来修补数据中的漏洞.

I have made two versions of the method to patch the holes in the data.

第一个使用消耗序列中带有空洞的数据的序列,并生成修补的序列.这就是我想要的,但是当输入序列中的元素数超过1000时,这些方法的运行速度会非常慢,输入序列中包含的元素越多,方法就会变得越来越糟.

The first consumes the sequence of data with holes in it and produces the patched sequence. This is what i want, but the methods runs horribly slow when the number of elements in the input sequence rises above 1000, and it gets progressively worse the more elements the input sequence contains.

第二种方法使用带有孔的数据的列表,并生成修补的序列,并且运行速度很快.但是,这不是我想要的,因为这会强制在内存中实例化整个输入列表.

The second method consumes a list of the data with holes and produces the patched sequence and it runs fast. This is however not what I want, since this forces the instantiation of the entire input-list in memory.

我想使用(sequence-> sequence)方法而不是(list-> sequence)方法,以避免将整个输入列表同时存储在内存中.

I would like to use the (sequence -> sequence) method rather than the (list -> sequence) method, to avoid having the entire input-list in memory at the same time.

问题:

1)为什么第一种方法这么慢(随着输入列表的增加,情况变得越来越糟) (我怀疑这与用Seq.skip 1重复创建新序列有关,但我不确定)

1) Why is the first method so slow (getting progressively worse with larger input lists) (I am suspecting that it has to do with repeatedly creating new sequences with Seq.skip 1, but I am not sure)

2)如何在使用输入序列而不是输入列表的同时快速修补数据中的孔?

2) How can I make the patching of holes in the data fast, while using an input sequence rather than an input list?

代码:

open System

// Method 1 (Slow)
let insertDummyValuesWhereASingleValueIsMissing1 (timeBetweenContiguousValues : TimeSpan) (values : seq<(DateTime * float)>) =
    let sizeOfHolesToPatch = timeBetweenContiguousValues.Add timeBetweenContiguousValues // Only insert dummy-values when the gap is twice the normal
    (None, values) |> Seq.unfold (fun (prevValue, restOfValues) ->  
        if restOfValues |> Seq.isEmpty then
            None // Reached the end of the input seq
        else
            let currentValue = Seq.hd restOfValues
            if prevValue.IsNone then
                Some(currentValue, (Some(currentValue), Seq.skip 1 restOfValues  )) // Only happens to the first item in the seq
            else
                let currentTime = fst currentValue
                let prevTime = fst prevValue.Value
                let timeDiffBetweenPrevAndCurrentValue = currentTime.Subtract(prevTime)
                if timeDiffBetweenPrevAndCurrentValue = sizeOfHolesToPatch then
                    let dummyValue = (prevTime.Add timeBetweenContiguousValues, 42.0) // 42 is chosen here for obvious reasons, making this comment superfluous
                    Some(dummyValue, (Some(dummyValue), restOfValues))
                else
                    Some(currentValue, (Some(currentValue), Seq.skip 1 restOfValues))) // Either the two values were contiguous, or the gap between them was too large to patch

// Method 2 (Fast)
let insertDummyValuesWhereASingleValueIsMissing2 (timeBetweenContiguousValues : TimeSpan) (values : (DateTime * float) list) =
    let sizeOfHolesToPatch = timeBetweenContiguousValues.Add timeBetweenContiguousValues // Only insert dummy-values when the gap is twice the normal
    (None, values) |> Seq.unfold (fun (prevValue, restOfValues) ->  
        match restOfValues with
        | [] -> None // Reached the end of the input list
        | currentValue::restOfValues -> 
            if prevValue.IsNone then
                Some(currentValue, (Some(currentValue), restOfValues  )) // Only happens to the first item in the list
            else
                let currentTime = fst currentValue
                let prevTime = fst prevValue.Value
                let timeDiffBetweenPrevAndCurrentValue = currentTime.Subtract(prevTime)
                if timeDiffBetweenPrevAndCurrentValue = sizeOfHolesToPatch then
                    let dummyValue = (prevTime.Add timeBetweenContiguousValues, 42.0) 
                    Some(dummyValue, (Some(dummyValue), currentValue::restOfValues))
                else
                    Some(currentValue, (Some(currentValue), restOfValues))) // Either the two values were contiguous, or the gap between them was too large to patch

// Test data
let numbers = {1.0..10000.0}
let contiguousTimeStamps = seq { for n in numbers -> DateTime.Now.AddMinutes(n)}

let dataWithOccationalHoles = Seq.zip contiguousTimeStamps numbers |> Seq.filter (fun (dateTime, num) -> num % 77.0 <> 0.0) // Has a gap in the data every 77 items

let timeBetweenContiguousValues = (new TimeSpan(0,1,0))

// The fast sequence-patching (method 2)
dataWithOccationalHoles |> List.of_seq |> insertDummyValuesWhereASingleValueIsMissing2 timeBetweenContiguousValues |> Seq.iter (fun pair -> printfn "%f %s" (snd pair) ((fst pair).ToString()))

// The SLOOOOOOW sequence-patching (method 1)
dataWithOccationalHoles |> insertDummyValuesWhereASingleValueIsMissing1 timeBetweenContiguousValues |> Seq.iter (fun pair -> printfn "%f %s" (snd pair) ((fst pair).ToString()))

推荐答案

Seq.skip构造一个新序列.我认为这就是为什么您的原始方法运行缓慢的原因.

Seq.skip constructs a new sequence. I think that is why your original approach is slow.

我的第一个倾向是使用序列表达式和Seq.pairwise.这是快速且易于阅读的.

My first inclination is to use a sequence expression and Seq.pairwise. This is fast and easy to read.

let insertDummyValuesWhereASingleValueIsMissingSeq (timeBetweenContiguousValues : TimeSpan) (values : seq<(DateTime * float)>) =
  let sizeOfHolesToPatch = timeBetweenContiguousValues.Add timeBetweenContiguousValues // Only insert dummy-values when the gap is twice the normal
  seq {
    yield Seq.hd values
    for ((prevTime, _), ((currentTime, _) as next)) in Seq.pairwise values do
      let timeDiffBetweenPrevAndCurrentValue = currentTime.Subtract(prevTime)
      if timeDiffBetweenPrevAndCurrentValue = sizeOfHolesToPatch then
        let dummyValue = (prevTime.Add timeBetweenContiguousValues, 42.0) // 42 is chosen here for obvious reasons, making this comment superfluous
        yield dummyValue
      yield next
  }

这篇关于为什么在本例中使用序列比使用列表慢得多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆