pandas.read_csv将列名移至一 [英] pandas.read_csv moves column names over one

查看：116 发布时间：2020/5/24 3:11:39 python csv pandas

本文介绍了pandas.read_csv将列名移至一的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用的是位于此处的ALL.zip文件.我的目标是用它创建一个熊猫DataFrame.但是，如果我运行

I am using the ALL.zip file located here. My goal is to create a pandas DataFrame with it. However, if I run

data=pd.read_csv(foo.csv)

列名不匹配.第一列没有名称，然后第二列标有第一列，最后一列是一系列NaN.所以我尝试了

the column names do not match up. The first column has no name, and then the second column is labeled with the first, and the last column is a Series of NaN. So I tried

colnames=[list of colnames]
data=pd.read_csv(foo.csv, names=colnames, header=False)

这给了我完全一样的东西，所以我跑了

which gave me the exact same thing, so I ran

data=pd.read_csv(foo.csv, names=colnames)

完美地排列了姓氏，但是让csv分配的列名(csv文档中的第一行)与它的第一行数据完全对齐.所以我跑了

which lined the colnames up perfectly, but had the csv assigned column names(the first line in the csv document) perfectly aligned as the first row of data it. So I ran

data=data[1:]

成功的秘诀.

因此，我找到了解决方法，但没有解决实际问题.我查看了 read_csv 文档，发现它有点不堪重负，无法找出仅使用pd.read_csv来解决此问题的方法.

So I found a work around without solving the actual problem. I looked at the read_csv document and found it a bit overwhelming, and could not figure out a way using only pd.read_csv to fix this problem.

根本问题是什么(我假设这是用户错误或文件问题)?是否可以使用read_csv中的命令之一对其进行修复?

What was the fundamental problem (I am assuming it is either user error or a problem with the file)? Is there a way to fix it with one of the commands from the read_csv?

这是csv文件的前2行

Here is the first 2 rows from the csv file

cmte_id,cand_id,cand_nm,contbr_nm,contbr_city,contbr_st,contbr_zip,contbr_employer,contbr_occupation,contb_receipt_amt,contb_receipt_dt,receipt_desc,memo_cd,memo_text,form_tp,file_num,tran_id,election_tp
C00458844,"P60006723","Rubio, Marco","HEFFERNAN, MICHAEL","APO","AE","090960009","INFORMATION REQUESTED PER BEST EFFORTS","INFORMATION REQUESTED PER BEST EFFORTS",210,27-JUN-15,"","","","SA17A","1015697","SA17.796904","P2016",

推荐答案

不是您遇到问题的列，而是索引

It's not the column that you're having a problem with, it's the index

import pandas as pd

df = pd.read_csv('P00000001-ALL.csv', index_col=False, low_memory=False)

print(df.head(1))

     cmte_id    cand_id       cand_nm           contbr_nm contbr_city  \
0  C00458844  P60006723  Rubio, Marco  HEFFERNAN, MICHAEL         APO   

  contbr_st contbr_zip                         contbr_employer  \
0        AE  090960009  INFORMATION REQUESTED PER BEST EFFORTS   

                        contbr_occupation  contb_receipt_amt contb_receipt_dt  \
0  INFORMATION REQUESTED PER BEST EFFORTS                210        27-JUN-15   

  receipt_desc memo_cd memo_text form_tp  file_num      tran_id election_tp  
0          NaN     NaN       NaN   SA17A   1015697  SA17.796904       P2016

low_memory=False是因为第6列具有混合数据类型.

The low_memory=False is because column 6 has mixed datatype.

这篇关于pandas.read_csv将列名移至一的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas.read_csv将列名移至一 [英] pandas.read_csv moves column names over one

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas.read_csv将列名移至一 [英] pandas.read_csv moves column names over one

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭