Python/Pandas:一系列dict:优化中的数据框 [英] Python/pandas: data frame from series of dict: optimization
本文介绍了Python/Pandas:一系列dict:优化中的数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个pandas系列字典,我想将其转换为具有相同索引的数据框.
I have a pandas Series of dictionnaries, and I want to convert it to a data frame with the same index.
我发现的唯一方法是通过本系列的to_dict
方法,该方法效率不高,因为它返回到纯python模式,而不是numpy/pandas/cython.
The only way I found is to pass through the to_dict
method of the series, which is not very efficient because it goes back to pure python mode instead of numpy/pandas/cython.
您有更好的建议吗?
非常感谢.
>>> import pandas as pd
>>> flagInfoSeries = pd.Series(({'a': 1, 'b': 2}, {'a': 10, 'b': 20}))
>>> flagInfoSeries
0 {'a': 1, 'b': 2}
1 {'a': 10, 'b': 20}
dtype: object
>>> pd.DataFrame(flagInfoSeries.to_dict()).T
a b
0 1 2
1 10 20
推荐答案
我认为您可以使用理解力:
I think you can use comprehension:
import pandas as pd
flagInfoSeries = pd.Series(({'a': 1, 'b': 2}, {'a': 10, 'b': 20}))
print flagInfoSeries
0 {u'a': 1, u'b': 2}
1 {u'a': 10, u'b': 20}
dtype: object
print pd.DataFrame(flagInfoSeries.to_dict()).T
a b
0 1 2
1 10 20
print pd.DataFrame([x for x in flagInfoSeries])
a b
0 1 2
1 10 20
时间:
In [203]: %timeit pd.DataFrame(flagInfoSeries.to_dict()).T
The slowest run took 4.46 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 554 µs per loop
In [204]: %timeit pd.DataFrame([x for x in flagInfoSeries])
The slowest run took 5.11 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 361 µs per loop
In [209]: %timeit flagInfoSeries.apply(lambda dict: pd.Series(dict))
The slowest run took 4.76 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 751 µs per loop
如果需要保持索引,请尝试将index=flagInfoSeries.index
添加到DataFrame
构造函数:
If you need keep index, try add index=flagInfoSeries.index
to DataFrame
constructor:
print pd.DataFrame([x for x in flagInfoSeries], index=flagInfoSeries.index)
时间:
In [257]: %timeit pd.DataFrame([x for x in flagInfoSeries], index=flagInfoSeries.index)
1000 loops, best of 3: 350 µs per loop
示例:
import pandas as pd
flagInfoSeries = pd.Series(({'a': 1, 'b': 2}, {'a': 10, 'b': 20}))
flagInfoSeries.index = [2,8]
print flagInfoSeries
2 {u'a': 1, u'b': 2}
8 {u'a': 10, u'b': 20}
print pd.DataFrame(flagInfoSeries.to_dict()).T
a b
2 1 2
8 10 20
print pd.DataFrame([x for x in flagInfoSeries], index=flagInfoSeries.index)
a b
2 1 2
8 10 20
这篇关于Python/Pandas:一系列dict:优化中的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文