python pandas 中的双端队列 [英] deque in python pandas

查看:213
本文介绍了python pandas 中的双端队列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python的deque()来实现一个简单的循环缓冲区:

I am using Python's deque() to implement a simple circular buffer:

from collections import deque
import numpy as np

test_sequence = np.array(range(100)*2).reshape(100,2)
mybuffer = deque(np.zeros(20).reshape((10, 2)))

for i in test_sequence:
    mybuffer.popleft()
    mybuffer.append(i)

    do_something_on(mybuffer)

我想知道是否有一种简单的方法可以使用Series(或DataFrame)在熊猫中获得相同的东西.换句话说,如何有效地在SeriesDataFrame的末尾添加一行,并删除该行的开头?

I was wondering if there's a simple way of obtaining the same thing in Pandas using a Series (or DataFrame). In other words, how can I efficiently add a single row at the end and remove a single row at the beginning of a Series or DataFrame?

,我尝试过此操作:

myPandasBuffer = pd.DataFrame(columns=('A','B'), data=np.zeros(20).reshape((10, 2)))
newpoint = pd.DataFrame(columns=('A','B'), data=np.array([[1,1]]))

for i in test_sequence:
    newpoint[['A','B']] = i
    myPandasBuffer = pd.concat([myPandasBuffer.ix[1:],newpoint], ignore_index = True)

    do_something_on(myPandasBuffer)

但是它比deque()方法慢得多.

推荐答案

如dorvak所述,pandas并非设计用于类似队列的行为.

As noted by dorvak, pandas is not designed for queue-like behaviour.

下面,我已经使用h5py模块在pandas数据帧,numpy数组以及hdf5中从deque复制了简单的插入功能.

Below I've replicated the simple insert function from deque in pandas dataframes, numpy arrays, and also in hdf5 using the h5py module.

timeit函数揭示(令人惊讶的是)collections模块要快得多,然后依次是numpy和pandas.

The timeit function reveals (unsurprisingly) that the collections module is much faster, followed by numpy and then pandas.

from collections import deque
import pandas as pd
import numpy as np
import h5py

def insert_deque(test_sequence, buffer_deque):
    for item in test_sequence:
        buffer_deque.popleft()
        buffer_deque.append(item)
    return buffer_deque
def insert_df(test_sequence, buffer_df):
    for item in test_sequence:
        buffer_df.iloc[0:-1,:] = buffer_df.iloc[1:,:].values
        buffer_df.iloc[-1] = item
    return buffer_df
def insert_arraylike(test_sequence, buffer_arr):
    for item in test_sequence:
        buffer_arr[:-1] = buffer_arr[1:]
        buffer_arr[-1] = item
    return buffer_arr

test_sequence = np.array(list(range(100))*2).reshape(100,2)

# create buffer arrays
nested_list = [[0]*2]*5
buffer_deque = deque(nested_list)
buffer_df = pd.DataFrame(nested_list, columns=('A','B'))
buffer_arr = np.array(nested_list)

# calculate speed of each process in ipython
print("deque : ")
%timeit insert_deque(test_sequence, buffer_deque)
print("pandas : ")
%timeit insert_df(test_sequence, buffer_df)
print("numpy array : ")
%timeit insert_arraylike(test_sequence, buffer_arr)
print("hdf5 with h5py : ")
with h5py.File("h5py_test.h5", "w") as f:
    f["buffer_hdf5"] = np.array(nested_list)
    %timeit insert_arraylike(test_sequence, f["buffer_hdf5"])

%timeit结果:

deque:每个循环34.1 µs

deque : 34.1 µs per loop

pandas:每个循环48毫秒

pandas : 48 ms per loop

numpy数组:每个循环187 µs

numpy array : 187 µs per loop

hdf5和h5py:每个循环31.7毫秒

hdf5 with h5py : 31.7 ms per loop

注意:

我的熊猫切片方法仅比问题中列出的concat方法快一点.

My pandas slicing method was only slightly faster than the concat method listed in the question.

hdf5格式(通过h5py)没有显示任何优势.我也没有看到Andy建议的HDFStore的任何优势.

The hdf5 format (via h5py) did not show any advantages. I also don't see any advantages of HDFStore, as suggested by Andy.

这篇关于python pandas 中的双端队列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆