使用Pandas读取CSV时如何在列中保持前导零? [英] How to keep leading zeros in a column when reading CSV with Pandas?
问题描述
我正在使用read_csv
将研究数据导入Pandas数据框中.
I am importing study data into a Pandas data frame using read_csv
.
我的主题代码是6个数字,其中包括出生日期.对于我的某些主题,这会导致代码的前导零(例如"010816").
My subject codes are 6 numbers coding, among others, the day of birth. For some of my subjects this results in a code with a leading zero (e.g. "010816").
当我导入Pandas时,会去除开头的零,并且该列的格式为int64
.
When I import into Pandas, the leading zero is stripped of and the column is formatted as int64
.
有没有一种方法可以不变地将该列作为字符串导入?
Is there a way to import this column unchanged maybe as a string?
我尝试为该列使用自定义转换器,但是它不起作用-似乎自定义转换发生在Pandas转换为int之前.
I tried using a custom converter for the column, but it does not work - it seems as if the custom conversion takes place before Pandas converts to int.
推荐答案
如中所示 Lev Landau 提出的这个问题/答案,可能有使用converters
选项的简单解决方案对于read_csv
函数中的特定列.
As indicated in this question/answer by Lev Landau, there could be a simple solution to use converters
option for a certain column in read_csv
function.
converters={'column_name': lambda x: str(x)}
您可以在pandas.io.parsers.read_csv
You can refer to more options of read_csv
funtion in pandas.io.parsers.read_csv documentation.
假设我有csv文件projects.csv
,如下所示:
Lets say I have csv file projects.csv
like below:
project_name,project_id
Some Project,000245
Another Project,000478
例如下面的代码正在修剪前导零:
As for example below code is triming leading zeros:
import csv
from pandas import read_csv
dataframe = read_csv('projects.csv')
print dataframe
结果:
me@ubuntu:~$ python test_dataframe.py
project_name project_id
0 Some Project 245
1 Another Project 478
me@ubuntu:~$
解决方案代码示例:
import csv
from pandas import read_csv
dataframe = read_csv('projects.csv', converters={'project_id': lambda x: str(x)})
print dataframe
必填结果:
me@ubuntu:~$ python test_dataframe.py
project_name project_id
0 Some Project 000245
1 Another Project 000478
me@ubuntu:~$
这篇关于使用Pandas读取CSV时如何在列中保持前导零?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!