将键值对读入Pandas [英] Reading key-value pairs into Pandas

查看:88
本文介绍了将键值对读入Pandas的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Pandas使得读取CSV文件非常容易:

Pandas makes it really easy to read a CSV file:

pd.read_table('data.txt', sep=',')

对于具有键值对的文件,熊猫是否有类似的东西?我想到了这个:

Does Pandas having something similar for a file with key-value pairs? I came-up with this:

pd.DataFrame([dict([p.split('=') for p in l.split(',')]) for l in open('data.txt')])

如果不是内置的,那么也许更惯用了吗?

If not built-in, then perhaps something more idiomatic?

感兴趣的文件如下:

symbol=ESM3,exchange=GLOBEX,timestamp=1365428525690751,price=1548.00,quantity=551
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525697183,price=1548.00,quantity=551
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525714498,price=1548.00,quantity=551
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525734967,price=1548.00,quantity=551
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525735567,price=1548.00,quantity=555
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525735585,price=1548.00,quantity=556
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525736116,price=1548.00,quantity=556
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525740757,price=1548.00,quantity=556
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525748502,price=1548.00,quantity=556
symbol=ESM3,exchange=GLOBEX,timestamp=1365428525748952,price=1548.00,quantity=557

每行上都有完全相同的键,并且顺序相同.没有空值.要生成的表是:

It has the exact same keys on every line, and in the same order. There are no null values. The table to be generated is:

  exchange    price quantity symbol         timestamp
0   GLOBEX  1548.00    551\n   ESM3  1365428525690751
1   GLOBEX  1548.00    551\n   ESM3  1365428525697183
2   GLOBEX  1548.00    551\n   ESM3  1365428525714498
3   GLOBEX  1548.00    551\n   ESM3  1365428525734967
4   GLOBEX  1548.00    555\n   ESM3  1365428525735567
5   GLOBEX  1548.00    556\n   ESM3  1365428525735585
6   GLOBEX  1548.00    556\n   ESM3  1365428525736116
7   GLOBEX  1548.00    556\n   ESM3  1365428525740757
8   GLOBEX  1548.00    556\n   ESM3  1365428525748502
9   GLOBEX  1548.00    557\n   ESM3  1365428525748952

(带入后,我可以用rstrip()quantity中删除\n.)

(I can remove the \n from quantity with an rstrip() after I've brought it in.)

推荐答案

如果您事先知道键名,并且名称始终以相同的顺序出现,那么您可以使用转换器将键名砍掉,然后使用names参数命名列:

If you know the key names beforehand and if the names always appear in the same order, then you could use a converter to chop off the key names, and then use the names parameter to name the columns:

import pandas as pd

def value(item):
    return item[item.find('=')+1:]

df = pd.read_table('data.txt', header=None, delimiter=',',
                   converters={i:value for i in range(5)},
                   names='symbol exchange timestamp price quantity'.split())
print(df)

关于您发布的数据收益

  symbol exchange         timestamp    price quantity
0   ESM3   GLOBEX  1365428525690751  1548.00      551
1   ESM3   GLOBEX  1365428525697183  1548.00      551
2   ESM3   GLOBEX  1365428525714498  1548.00      551
3   ESM3   GLOBEX  1365428525734967  1548.00      551
4   ESM3   GLOBEX  1365428525735567  1548.00      555
5   ESM3   GLOBEX  1365428525735585  1548.00      556
6   ESM3   GLOBEX  1365428525736116  1548.00      556
7   ESM3   GLOBEX  1365428525740757  1548.00      556
8   ESM3   GLOBEX  1365428525748502  1548.00      556
9   ESM3   GLOBEX  1365428525748952  1548.00      557

这篇关于将键值对读入Pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆