如何在 Python 中的字符串上实现滑动窗口以在 O(1) 中构造每个新窗口 [英] How to implement a sliding window over the string in Python to have each new window constructed in O(1)
问题描述
我正在尝试优化算法的速度.它是在长度为 n
的长字符串上固定大小 m
的滑动窗口.目前它在(n - m)m
时间内工作,其中m
用于构建长度为的滑动窗口
.tmp
米
I'm trying to optimise the speed of the algorithm. It's a sliding window of fixed size m
over the long string of length n
. So far it works in (n - m)m
time, where m
is taken for the construction of sliding window tmp
of length m
.
for i in range(0, n - m + 1):
tmp = string[i:i + m]
# then doing smth with tmp string
我想知道是否可以通过使用 join 将时间减少到 n - m
,因为每次我进行一个字符的移位,所以我只需要切断第一个字符并追加下一个
I wonder if I could reduce the time to n - m
by using join, because each time I do a one-character shift and so all I need is to cut off the first character and to append the next one
tmp = string[:m]
for i in range(1, n - m + 1):
tmp = ''.join([tmp[1:], string[i + m - 1]])
# then doing smth with tmp string
问题是我无法弄清楚如何在恒定时间内执行 tmp[1:]
.
The problem is I can't figure out how to perform tmp[1:]
in a constant time.
你有什么想法吗?
推荐答案
所以我真的不确定结果,所以我运行了几个基准测试.
So I really wasn't sure about the results, so I ran a couple of benchmarks.
我的测试是有一个长度为 10000 的列表,并在其上滑动一个大小为 1000 的窗口.然后在每次迭代中,只需对窗口的所有元素求和,然后对所有迭代的所有结果求和.
My test was to have a list of length 10000 and have a window of size 1000 slide over it. Then in every iteration, just sum all elements of the window, and then sum all results of all iterations.
结果如下:
run_sliced: 296.210 ms/it (Iterations: 7)
run_memoryview: 409.670 ms/it (Iterations: 5)
run_shift: 296.495 ms/it (Iterations: 7)
run_cpp: 3.066 ms/it (Iterations: 653)
run_rust: 1.132 ms/it (Iterations: 1768)
这个基准测试中最奇怪的部分是 memoryview
是迄今为止最慢的.不完全确定该怎么做.
The weirdest part of this benchmark is that memoryview
was by far the slowest.
Not entirely sure what to make of that.
import time
import array
# Our external c++ and rust functions
import ext_cpp
import ext_rust
# List of size 10000.
# Our moving window is size 1000. That means we need to slice 9000 times.
# We simply take the sum of all numbers in the window,
# and then again the sum of all windows.
# Just to prevent any optimizations from happening.
listSize = 10000
windowSize = 1000
a = list(range(listSize))
numWindows = listSize - windowSize + 1
def time_it(name, func):
t0 = time.perf_counter()
iterations = 0
while True:
assert(func(a) == 45000499500)
iterations += 1
t1 = time.perf_counter()
if t1 - t0 > 2:
break
per_iteration = 1000.0*(t1-t0)/iterations
print("{:<15}".format(name+":")
+ "{:>10}".format(format(per_iteration, '.3f')) + " ms/it"
+ " (Iterations: " + str(iterations) + ")")
def run_sliced(a):
r = 0
for i in range(numWindows):
window = a[i:i+windowSize]
for el in window:
r += el
return r
def run_memoryview(a):
r = 0
a_raw = array.array('l', a)
a = memoryview(a_raw)
for i in range(numWindows):
window = a[i:i+windowSize]
for el in window:
r += el
return r
def run_shift(a):
r = 0
window = a[:windowSize]
for el in window:
r += el
for i in range(1, numWindows):
window = window[1:]
window.append(a[i + windowSize - 1])
for el in window:
r += el
return r
def run_cpp(a):
return ext_cpp.run(a, numWindows, windowSize)
def run_rust(a):
return ext_rust.run(a, numWindows, windowSize)
time_it('run_sliced', run_sliced)
time_it('run_memoryview', run_memoryview)
time_it('run_shift', run_shift)
time_it('run_cpp', run_cpp)
time_it('run_rust', run_rust)
long run(std::vector<long> a, long windowSize, long numWindows){
long r = 0;
for(long i = 0; i < numWindows; i++){
const long* window = &a[i];
for(long j = 0; j < windowSize; j++){
r += window[j];
}
}
return r;
}
fn run(_py: Python, a: Vec<u64>, num_windows: usize, window_size: usize)
-> PyResult<u64>
{
let mut r = 0;
for i in 0..num_windows {
let window = &a[i..i+window_size];
for el in window {
r += el;
}
}
Ok(r)
}
这篇关于如何在 Python 中的字符串上实现滑动窗口以在 O(1) 中构造每个新窗口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!