pandas read_csv以0开头时会更改列 [英] Pandas read_csv alters the columns when it starts with 0

查看:118
本文介绍了 pandas read_csv以0开头时会更改列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个脚本,可以从CSV文件中读取一些邮政编码. 邮政编码的格式如下:

I have a script where I read from a csv file some zipcodes. The format of the zipcodes are like this:

zipcode
75180
90672
01037
20253
09117
31029
07745
90453
12105
18140
36108
10403
76470
06628
93105
88069
31094
84095
63069

然后我运行一个脚本:

import requests
import pandas as pd
import time

file = '/Users/zipcode.csv'
reader = pd.read_csv(file, sep=';', encoding='utf-8-sig')

zipcodes = reader["zipcode"].astype(str)
base_url = "https://api.blabla/?zipcode={zipcode}"
headers = {'Authentication': 'random'}

for zipcode in zipcodes:
    url = base_url.format(zipcode=zipcode)
    r = requests.get(url,
                     headers=headers)
    for r_info in r.json()["data"]:
        print zipcode,r_info["id"]
    time.sleep(0.5)

但是,无论何时有一个以0开头的邮政编码,我得到的结果都是4位数字,并且不能与实际的0匹配. 我已经将csv格式化为在其中包含文本列,但是仍然无法正常工作.

However, whenever there is a zipcode starting with 0, the result I get is with 4 digits and it can't match with the actual 0. I have formatted my csv to have a text column in it, but still it doesn't work.

我得到的邮政编码是这样的:

The zipcodes I get are like this:

zipcode
75180
90672
1037
20253
9117
31029
7745
90453
12105
18140
36108
10403
76470
6628
93105
88069
31094
84095
63069

您知道如何解决吗?

推荐答案

您需要将dtype传递为str:

reader = pd.read_csv(file, sep=';', encoding='utf-8-sig', dtype=str)

将这些值读取为str:

to read those values as str:

In [152]:
import pandas as pd
import io
t="""zipcode
75180
90672
01037
20253
09117
31029
07745
90453
12105
18140
36108
10403
76470
06628
93105
88069
31094
84095
63069"""
df = pd.read_csv(io.StringIO(t), dtype=str)
df

Out[152]:
   zipcode
0    75180
1    90672
2    01037
3    20253
4    09117
5    31029
6    07745
7    90453
8    12105
9    18140
10   36108
11   10403
12   76470
13   06628
14   93105
15   88069
16   31094
17   84095
18   63069

默认情况下,pandas会嗅出dytpes,在这种情况下,它会认为它们是数字的,因此您会丢失前导零

by default pandas sniffs the dytpes and in this case it thinks they are numeric so you lose leading zeroes

您也可以将其转换为str作为后处理步骤,然后使用矢量化的

You can also do this as a post-processing step by casting to str and then using the vectorised str.zfill:

In [154]:
df['zipcode'] = df['zipcode'].astype(str).str.zfill(5)
df

Out[154]:
   zipcode
0    75180
1    90672
2    01037
3    20253
4    09117
5    31029
6    07745
7    90453
8    12105
9    18140
10   36108
11   10403
12   76470
13   06628
14   93105
15   88069
16   31094
17   84095
18   63069

这篇关于 pandas read_csv以0开头时会更改列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆