使用 Pandas 读取 CSV 时如何在列中保留前导零? [英] How to keep leading zeros in a column when reading CSV with Pandas?

查看:80
本文介绍了使用 Pandas 读取 CSV 时如何在列中保留前导零?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 read_csv 将研究数据导入 Pandas 数据框.

I am importing study data into a Pandas data frame using read_csv.

我的主题代码是 6 个数字编码,其中包括出生日期.对于我的一些主题,这会导致代码带有前导零(例如010816").

My subject codes are 6 numbers coding, among others, the day of birth. For some of my subjects this results in a code with a leading zero (e.g. "010816").

当我导入到 Pandas 中时,前导零被去除,列的格式为 int64.

When I import into Pandas, the leading zero is stripped of and the column is formatted as int64.

有没有办法将这一列原样导入为字符串?

Is there a way to import this column unchanged maybe as a string?

我尝试为列使用自定义转换器,但它不起作用 - 似乎自定义转换发生在 Pandas 转换为 int 之前.

I tried using a custom converter for the column, but it does not work - it seems as if the custom conversion takes place before Pandas converts to int.

推荐答案

所示这个问题/答案来自Lev Landau,可能有一个简单的解决方案来使用转换器read_csv 函数中特定列的 选项.

As indicated in this question/answer by Lev Landau, there could be a simple solution to use converters option for a certain column in read_csv function.

converters={'column_name': lambda x: str(x)}

可以参考pandas.io.parsers.read_csv中read_csv函数的更多选项文档.

You can refer to more options of read_csv funtion in pandas.io.parsers.read_csv documentation.

假设我有 csv 文件 projects.csv,如下所示:

Lets say I have csv file projects.csv like below:

project_name,project_id
Some Project,000245
Another Project,000478

例如下面的代码正在修剪前导零:

As for example below code is triming leading zeros:

import csv
from pandas import read_csv

dataframe = read_csv('projects.csv')
print dataframe

结果:

me@ubuntu:~$ python test_dataframe.py 
      project_name  project_id
0     Some Project         245
1  Another Project         478
me@ubuntu:~$

解决方案代码示例:

import csv
from pandas import read_csv

dataframe = read_csv('projects.csv', converters={'project_id': lambda x: str(x)})
print dataframe

要求的结果:

me@ubuntu:~$ python test_dataframe.py 
      project_name project_id
0     Some Project     000245
1  Another Project     000478
me@ubuntu:~$

更新,因为它可以帮助他人:

Update as it helps others:

要将所有列作为str,可以这样做(来自评论):

To have all columns as str, one can do this (from the comment):

pd.read_csv('sample.csv', dtype = str)

要将大多数或选择性列作为str,可以这样做:

To have most or selective columns as str, one can do this:

# lst of column names which needs to be string
lst_str_cols = ['prefix', 'serial']
# use dictionary comprehension to make dict of dtypes
dict_dtypes = {x : 'str'  for x in lst_str_cols}
# use dict on dtypes
pd.read_csv('sample.csv', dtype=dict_dtypes)

这篇关于使用 Pandas 读取 CSV 时如何在列中保留前导零?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆