pandas 在缺少一个标头的情况下读取了csv [英] Pandas read csv where one header is missing

查看:37
本文介绍了 pandas 在缺少一个标头的情况下读取了csv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Pandas读取csv文件,但第一列包含用逗号分隔的名字和姓氏.这使Pandas认为有5列而不是4列,因此最后一列现在没有标题,因此无法选择它.

I am trying to read a csv file with Pandas but the first column contains a first name and a last name seperated by a comma. This causes Pandas to think that there are 5 columns instead of 4 so the last column now has no header making it unable to be selected.

文件如下:

CustomerName,ClientID,EmailDate,EmailAddress
FNAME1,LNAME1,100,2019-01-13 00:00:00.000,FNAME1@HOTMAIL.COM
FNAME2,LNAME2,100,2019-01-13 00:00:00.000,FNAME2@GMAIL.COM
FNAME3,LNAME3,100,2019-01-13 00:00:00.000,FNAME3@AOL.COM
FNAME4,LNAME4,100,2019-01-13 00:00:00.000,FNAME40@GMAIL.COM
FNAME5,LNAME5,100,2019-01-13 00:00:00.000,FNAME5@AOL.COM

我的代码现在是什么样的:

What my code looks like now:

def convert_ftp_data():
    file = os.getcwd() + "/data.csv"
    data = pd.read_csv(file, index_col=False)

data["first_name"] = data["CustomerName"].str.split().str[0].str.title()
data["email"] = data["EmailAddress"]

clean_data = data.drop(data[["CustomerName", "ClientID", "EmailDate", "EmailAddress"]], 1)

print(clean_data)

使用我的代码,我得到以下输出:

Using my code I get the following output:

first_name  email
0   FNAME1  2019-01-13 00:00:00.000
1   FNAME1  2019-01-13 00:00:00.000
2   FNAME1  2019-01-13 00:00:00.000
3   FNAME1  2019-01-13 00:00:00.000
4   FNAME1  2019-01-13 00:00:00.000

我只需要选择FNAME和EmailAddress字段.最好的方法是什么?

I only need to select the FNAME and EmailAddress field. What would be the best way to do this?

推荐答案

为什么不直接跳过标头并在导入后正确设置标头

Why not just skip the header and set it correctly after import

data = pd.read_csv(file, index_col=False, header=None, skiprows=1)

data.columns = 'CustomerFirstName,CustomerName,ClientID,EmailDate,EmailAddress'.split(',')

这篇关于 pandas 在缺少一个标头的情况下读取了csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆