将键值对读入Pandas [英] Reading key-value pairs into Pandas
问题描述
Pandas使得读取CSV文件非常容易:
Pandas makes it really easy to read a CSV file:
pd.read_table('data.txt', sep=',')
对于具有键值对的文件,熊猫是否有类似的东西?我想到了这个:
Does Pandas having something similar for a file with key-value pairs? I came-up with this:
pd.DataFrame([dict([p.split('=') for p in l.split(',')]) for l in open('data.txt')])
如果不是内置的,那么也许更惯用了吗?
If not built-in, then perhaps something more idiomatic?
感兴趣的文件如下:
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525690751,price=1548.00,quantity=551
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525697183,price=1548.00,quantity=551
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525714498,price=1548.00,quantity=551
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525734967,price=1548.00,quantity=551
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525735567,price=1548.00,quantity=555
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525735585,price=1548.00,quantity=556
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525736116,price=1548.00,quantity=556
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525740757,price=1548.00,quantity=556
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525748502,price=1548.00,quantity=556
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525748952,price=1548.00,quantity=557
每行上都有完全相同的键,并且顺序相同.没有空值.要生成的表是:
It has the exact same keys on every line, and in the same order. There are no null values. The table to be generated is:
exchange price quantity symbol timestamp
0 GLOBEX 1548.00 551\n ESM3 1365428525690751
1 GLOBEX 1548.00 551\n ESM3 1365428525697183
2 GLOBEX 1548.00 551\n ESM3 1365428525714498
3 GLOBEX 1548.00 551\n ESM3 1365428525734967
4 GLOBEX 1548.00 555\n ESM3 1365428525735567
5 GLOBEX 1548.00 556\n ESM3 1365428525735585
6 GLOBEX 1548.00 556\n ESM3 1365428525736116
7 GLOBEX 1548.00 556\n ESM3 1365428525740757
8 GLOBEX 1548.00 556\n ESM3 1365428525748502
9 GLOBEX 1548.00 557\n ESM3 1365428525748952
(带入后,我可以用rstrip()
从quantity
中删除\n
.)
(I can remove the \n
from quantity
with an rstrip()
after I've brought it in.)
推荐答案
如果您事先知道键名,并且名称始终以相同的顺序出现,那么您可以使用转换器将键名砍掉,然后使用names
参数命名列:
If you know the key names beforehand and if the names always appear in the same order, then you could use a converter to chop off the key names, and then use the names
parameter to name the columns:
import pandas as pd
def value(item):
return item[item.find('=')+1:]
df = pd.read_table('data.txt', header=None, delimiter=',',
converters={i:value for i in range(5)},
names='symbol exchange timestamp price quantity'.split())
print(df)
关于您发布的数据收益
symbol exchange timestamp price quantity
0 ESM3 GLOBEX 1365428525690751 1548.00 551
1 ESM3 GLOBEX 1365428525697183 1548.00 551
2 ESM3 GLOBEX 1365428525714498 1548.00 551
3 ESM3 GLOBEX 1365428525734967 1548.00 551
4 ESM3 GLOBEX 1365428525735567 1548.00 555
5 ESM3 GLOBEX 1365428525735585 1548.00 556
6 ESM3 GLOBEX 1365428525736116 1548.00 556
7 ESM3 GLOBEX 1365428525740757 1548.00 556
8 ESM3 GLOBEX 1365428525748502 1548.00 556
9 ESM3 GLOBEX 1365428525748952 1548.00 557
这篇关于将键值对读入Pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!