使用Pandas处理可变数量的列-Python [英] Handling Variable Number of Columns with Pandas - Python

查看:93
本文介绍了使用Pandas处理可变数量的列-Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数据集(最多5列-但可以更少)

I have a data set that looks like this (at most 5 columns - but can be less)

1,2,3
1,2,3,4
1,2,3,4,5
1,2
1,2,3,4
....

我正在尝试使用pandas read_table将其读入5列数据帧.我想在不加按摩的情况下阅读本文.

I am trying to use pandas read_table to read this into a 5 column data frame. I would like to read this in without additional massaging.

如果我尝试

import pandas as pd
my_cols=['A','B','C','D','E']
my_df=pd.read_table(path,sep=',',header=None,names=my_cols)

我得到一个错误-列名有5个字段,数据有3个字段".

I get an error - "column names have 5 fields, data has 3 fields".

在读取数据时,有什么方法可以让熊猫为缺失的列填写NaN吗?

推荐答案

一种可行的方法(至少在0.10.1和0.11.0.dev-fc8de6d中有效):

One way which seems to work (at least in 0.10.1 and 0.11.0.dev-fc8de6d):

>>> !cat ragged.csv
1,2,3
1,2,3,4
1,2,3,4,5
1,2
1,2,3,4
>>> my_cols = ["A", "B", "C", "D", "E"]
>>> pd.read_csv("ragged.csv", names=my_cols, engine='python')
   A  B   C   D   E
0  1  2   3 NaN NaN
1  1  2   3   4 NaN
2  1  2   3   4   5
3  1  2 NaN NaN NaN
4  1  2   3   4 NaN

请注意,这种方法要求您为想要的列命名.不如其他方式通用,但适用时效果很好.

Note that this approach requires that you give names to the columns you want, though. Not as general as some other ways, but works well enough when it applies.

这篇关于使用Pandas处理可变数量的列-Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆