用于猜测当前表示为字符串的数据类型的方法 [英] Method for guessing type of data represented currently represented as strings
问题描述
我目前正在解析CSV表格,需要发现这些列的数据类型。我不知道值的确切格式。显然,CSV解析器输出的所有内容都是字符串。我目前感兴趣的数据类型有:
I'm currently parsing CSV tables and need to discover the "data types" of the columns. I don't know the exact format of the values. Obviously, everything that the CSV parser outputs is a string. The data types I am currently interested in are:
- 整数
- 浮点数
- 日期
- boolean
- string
- integer
- floating point
- date
- boolean
- string
我目前的想法是测试行(可能几百?)的样本,以确定通过模式匹配存在的数据类型。
My current thoughts are to test a sample of rows (maybe several hundred?) in order to determine the types of data present through pattern matching.
我特别关心日期数据类型 - 是他们的一个python模块,用于解析常见的日期成语(显然我不能检测到它们) ?
I am particularly concerned about the date data type - is their a python module for parsing common date idioms (obviously I will not be able to detect them all)?
整数和浮点数怎么办?
推荐答案
Dateutil 用于解析日期。
对于整数和浮点数,您可以随时在try / except部分尝试转换。
For integers and floats you could always try a cast in a try/except section
>>> f = "2.5"
>>> i = "9"
>>> ci = int(i)
>>> ci
9
>>> cf = float(f)
>>> cf
2.5
>>> g = "dsa"
>>> cg = float(g)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for float(): dsa
>>> try:
... cg = float(g)
... except:
... print "g is not a float"
...
g is not a float
>>>
这篇关于用于猜测当前表示为字符串的数据类型的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!