CSV阅读器和DictReader将数字字段转换为字符串 [英] CSV reader and DictReader turn numeric fields into strings
问题描述
csv的第一行有标题。
这是我的csv的示例行:
The first row of the csv has the headers. Here is a sample row of my csv:
2013-07-31 00:00:00,,1.0,2013.0,7.0,Q3,21160742,32HHBS1307170203,KL0602130731,AIRFRANCE
KLM,KLM,KLM,KLM,KL,KLM ROYAL DUTCH AIRLINES,,0602,,KL0602,KL,KLM ROYAL DUTCH
AIRLINES,,,,KL,0602,,,LAX,AMS,,31-7-2013 0:00:00,2013-07-31,2013-07-31,2013-07-31,2013-07-31,
13:55:00,14:39:00,20:55:00,21:39:00,2013-08-01,2013-08-01,2013-08-01,2013-08-01,
09:05:00,09:45:00,07:05:00,07:45:00,2.0,,2,,,LAX,LOS ANGELES INTERNATIONAL AIRPORT,
LAX,LAX,5.0,LAX,LOS ANGELES,US,UNITED STATES OF AMERICA,US,USA,NA8,NORTHERN AMERICA,
AMERICAS,,,,AMS,SCHIPHOL I,F,OFFLINE,I,INDIRECT OFFLINE,14.0,3.0,FRONT,Business,2.0,nan,
PLANNED,3.0,,2.0,2.0,34.0,4.0,400254887nan,1.0,2.0,2.0,2.0,1.0,2.0,6.0,3.0,1.0,3.0,1.0,1.0,
nan,nan,nan,nan,nan,nan,nan,3.0,3.0,3.0,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,
nan,2.0,2.0,2.0,2.0,2.0,7.0,nan,2.0,3.0,3.0,3.0,3.0,nan,nan,nan,nan,nan,nan,nan,nan,nan,
nan,nan,nan,nan,6.0,1.0,nan,nan,nan,nan,nan,2.0,nan,nan,nan,nan,nan,nan,nan,nan,nan,2.0,2.0,
nan,2.0,nan,3.0,nan,,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,nan,13.7885862654653,
0.2, 34273499844164,nan,37.0,Booked,35.0,10.0,2.0,2.0,6.0,35.0,10.0,42.0,nan,nan,LAX,LAX,N
如果我使用 input_file = csv.DictReader(open(file.csv)
code> input_file = csv.reader(open('file.csv')),我所有的对象都会变成字符串。
If I use either input_file = csv.DictReader(open("file.csv")
or input_file = csv.reader(open('file.csv'))
, all my objects will turn into strings.
在python中打印的一行:
A piece of a row printed in python:
'2013-08-31 00:00:00', '', '1.0', '2013.0', '8.0', 'Q3','C', '03J', '', '',
'', '', 'nan', 'nan', '', 'NON-AIRPORT', 'SELF-SERVICE', 'ICI', '', '19.0', '20130819',
'1.0', '19.0', '9.0', '20130901', '2.0', '1.0', '1.0', '1.0', '10.0', '5.0', '5.0', '3.0',
'4.0', '4.0', '2.0', '2.0', '', 'nan', '2.0', '', '24854524', 'nan', 'nan', 'nan', 'nan',
'1.0', 'nan', '5.0', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan',
'nan', '4.0', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan',
'nan', 'nan', 'nan', '2.0', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan',
'nan', '3.0', '5.0', '5.0'
正如你可以看到所有的日期,字符串,浮动和整数已经变成字符串。如何正确导入它们?假设它有400列数据,我不能手动定义每一列的类型。
As you can see all dates, strings, floats and integers have been turned into strings. How can I correctly import them? Assuming that it we have 400 columns of data and I cannot define manually the type of each column.
推荐答案
这向后。这不是说他们是字符串,而是他们是字符串,因为CSV不是保留类型信息的格式。你没有做任何事情,把它们变成别的什么,Python不会猜到。 Nan
是一个浮动,还是一个亲爱的祖母的名字? 3.0
一个浮动,或前卫的书呆子蓝调乐队的名称?
You're looking at this backwards. It's not that they're being turned into strings, it's that they are strings, in the sense that CSV isn't a format that preserves type information. You didn't do anything to turn them into anything else, and Python isn't going to guess. Is Nan
a float, or an affectionate name for one's grandmother? Is 3.0
a float, or the name of an avant-garde nerdcore blues band?
如果你能想到一个算法来猜测类型,那么你可以应用它,当然:
If you can think of an algorithm to guess the types, then you can apply that, of course:
import csv
import ast
import datetime
def guess_type(x):
attempt_fns = [ast.literal_eval,
float,
lambda x: datetime.datetime.strptime(x,
"%Y-%m-%d %H:%M:%S")
]
for fn in attempt_fns:
try:
return fn(x)
except (ValueError, SyntaxError):
pass
return x
with open("untyped.csv", "rb") as fp:
reader = csv.reader(fp)
for row in reader:
row = [guess_type(x) for x in row]
print row
print map(type, row)
使用档案
2013-07-31 00:00:00,,1.0,2013.0,7.0,Q3,21160742,32HHBS1307170203,nan
上述代码将产生
[datetime.datetime(2013, 7, 31, 0, 0), '', 1.0, 2013.0, 7.0, 'Q3', 21160742, '32HHBS1307170203', nan]
[<type 'datetime.datetime'>, <type 'str'>, <type 'float'>, <type 'float'>, <type 'float'>, <type 'str'>, <type 'int'>, <type 'str'>, <type 'float'>]
。
PS:如果你打算在Python中使用CSV文件进行严肃的工作,我强烈建议你检查 pandas - 否则将浪费时间重新实现其部分功能。
PS: If you're going to be doing serious work with CSV files in Python, I strongly recommend checking out pandas-- you'll waste time reimplementing parts of its functionality otherwise.
这篇关于CSV阅读器和DictReader将数字字段转换为字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!