如何在 Python 中的字符串上实现滑动窗口以在 O(1) 中构造每个新窗口 [英] How to implement a sliding window over the string in Python to have each new window constructed in O(1)

查看:40
本文介绍了如何在 Python 中的字符串上实现滑动窗口以在 O(1) 中构造每个新窗口的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试优化算法的速度.它是在长度为 n 的长字符串上固定大小 m 的滑动窗口.目前它在(n - m)m时间内工作,其中m用于构建长度为的滑动窗口tmp.

I'm trying to optimise the speed of the algorithm. It's a sliding window of fixed size m over the long string of length n. So far it works in (n - m)m time, where m is taken for the construction of sliding window tmp of length m.

for i in range(0, n - m + 1):
  tmp = string[i:i + m]
  # then doing smth with tmp string

我想知道是否可以通过使用 join 将时间减少到 n - m,因为每次我进行一个字符的移位,所以我只需要切断第一个字符并追加下一个

I wonder if I could reduce the time to n - m by using join, because each time I do a one-character shift and so all I need is to cut off the first character and to append the next one

tmp = string[:m]
for i in range(1, n - m + 1):
  tmp = ''.join([tmp[1:], string[i + m - 1]])
  # then doing smth with tmp string

问题是我无法弄清楚如何在恒定时间内执行 tmp[1:].

The problem is I can't figure out how to perform tmp[1:] in a constant time.

你有什么想法吗?

推荐答案

所以我真的不确定结果,所以我运行了几个基准测试.

So I really wasn't sure about the results, so I ran a couple of benchmarks.

我的测试是有一个长度为 10000 的列表,并在其上滑动一个大小为 1000 的窗口.然后在每次迭代中,只需对窗口的所有元素求和,然后对所有迭代的所有结果求和.

My test was to have a list of length 10000 and have a window of size 1000 slide over it. Then in every iteration, just sum all elements of the window, and then sum all results of all iterations.

结果如下:

run_sliced:       296.210 ms/it (Iterations: 7)
run_memoryview:   409.670 ms/it (Iterations: 5)
run_shift:        296.495 ms/it (Iterations: 7)
run_cpp:            3.066 ms/it (Iterations: 653)
run_rust:           1.132 ms/it (Iterations: 1768)

这个基准测试中最奇怪的部分是 memoryview 是迄今为止最慢的.不完全确定该怎么做.

The weirdest part of this benchmark is that memoryview was by far the slowest. Not entirely sure what to make of that.

import time
import array

# Our external c++ and rust functions
import ext_cpp
import ext_rust

# List of size 10000.
# Our moving window is size 1000. That means we need to slice 9000 times.
# We simply take the sum of all numbers in the window,
# and then again the sum of all windows.
# Just to prevent any optimizations from happening.

listSize = 10000
windowSize = 1000

a = list(range(listSize))
numWindows = listSize - windowSize + 1

def time_it(name, func):
    t0 = time.perf_counter()
    iterations = 0

    while True:
        assert(func(a) == 45000499500)
        iterations += 1
        t1 = time.perf_counter()
        if t1 - t0 > 2:
            break

    per_iteration = 1000.0*(t1-t0)/iterations
    print("{:<15}".format(name+":")
          + "{:>10}".format(format(per_iteration, '.3f')) + " ms/it"
          + " (Iterations: " + str(iterations) + ")")

def run_sliced(a):
    r = 0
    for i in range(numWindows):
        window = a[i:i+windowSize]
        for el in window:
            r += el
    return r

def run_memoryview(a):
    r = 0
    a_raw = array.array('l', a)
    a = memoryview(a_raw)
    for i in range(numWindows):
        window = a[i:i+windowSize]
        for el in window:
            r += el
    return r

def run_shift(a):
    r = 0    
    window = a[:windowSize]
    for el in window:
        r += el
    for i in range(1, numWindows):
        window = window[1:]
        window.append(a[i + windowSize - 1])
        for el in window:
            r += el
    return r

def run_cpp(a):
    return ext_cpp.run(a, numWindows, windowSize)

def run_rust(a):
    return ext_rust.run(a, numWindows, windowSize)


time_it('run_sliced', run_sliced)
time_it('run_memoryview', run_memoryview)
time_it('run_shift', run_shift)
time_it('run_cpp', run_cpp)
time_it('run_rust', run_rust)


long run(std::vector<long> a, long windowSize, long numWindows){
    long r = 0;
    for(long i = 0; i < numWindows; i++){
        const long* window = &a[i];
        for(long j = 0; j < windowSize; j++){
            r += window[j];
        }
    }
    return r;
}


fn run(_py: Python, a: Vec<u64>, num_windows: usize, window_size: usize)
    -> PyResult<u64>
{
    let mut r = 0;
    for i in 0..num_windows {
        let window = &a[i..i+window_size];
        for el in window {
            r += el; 
        }
    }
    Ok(r)
}

这篇关于如何在 Python 中的字符串上实现滑动窗口以在 O(1) 中构造每个新窗口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆