如何将 pandas 数据框行快速转换为ordereddict [英] How to turn pandas dataframe row into ordereddict fast
问题描述
寻找一种快速的方法来在不使用list的情况下将pandas数据框中的一行转换为有序dict.列表很好,但是具有大数据集将花费很长时间.我正在使用fiona GIS阅读器,并且行使用了给出数据类型的模式进行了排序.我使用熊猫来连接数据.在很多情况下,行的类型会有所不同,因此我想将字符串类型转换为numpy数组可能会解决问题.
Looking for a fast way to get a row in a pandas dataframe into a ordered dict with out using list. List are fine but with large data sets will take to long. I am using fiona GIS reader and the rows are ordereddicts with the schema giving the data type. I use pandas to join data. I many cases the rows will have different types so I was thinking turning into a numpy array with type string might do the trick.
推荐答案
不幸的是,您不能只进行申请(因为它适合于DataFrame):
Unfortunately you can't just do an apply (since it fits it back to a DataFrame):
In [1]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])
In [2]: df
Out[2]:
a b
0 1 2
1 3 4
In [3]: from collections import OrderedDict
In [4]: df.apply(OrderedDict)
Out[4]:
a b
0 1 2
1 3 4
但是您可以将列表理解与 iterrows :
But you can use a list comprehension with iterrows:
In [5]: [OrderedDict(row) for i, row in df.iterrows()]
Out[5]: [OrderedDict([('a', 1), ('b', 2)]), OrderedDict([('a', 3), ('b', 4)])]
如果可以使用生成器而不是列表,则无论使用哪种生成器,效率通常都会更高:
If it was possible to use a generator, rather than a list, to whatever you were working with this will usually be more efficient:
In [6]: (OrderedDict(row) for i, row in df.iterrows())
Out[6]: <generator object <genexpr> at 0x10466da50>
这篇关于如何将 pandas 数据框行快速转换为ordereddict的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!