猜测当前表示为字符串的数据类型的方法 [英] Method for guessing type of data represented currently represented as strings
问题描述
我目前正在解析 CSV 表,需要发现列的数据类型".我不知道这些值的确切格式.显然,CSV 解析器输出的所有内容都是字符串.我目前感兴趣的数据类型是:
- 整数
- 浮点数
- 日期
- 布尔值
- 字符串
我目前的想法是测试一个行样本(可能是几百?),以确定通过模式匹配存在的数据类型.
我特别关心日期数据类型 - 它们是用于解析常见日期习语的 Python 模块吗(显然我无法检测到它们)?
整数和浮点数呢?
Dateutil想到解析日期.
对于整数和浮点数,您总是可以尝试在 try/except 部分进行强制转换
<预><代码>>>>f = "2.5">>>我 = "9">>>ci = int(i)>>>词9>>>cf = 浮点数(f)>>>比照2.5>>>g = "dsa">>>cg = 浮动(克)回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中ValueError:float() 的无效文字:dsa>>>尝试:... cg = 浮点数(g)... 除了:...打印g 不是浮点数"...g 不是浮点数>>>I'm currently parsing CSV tables and need to discover the "data types" of the columns. I don't know the exact format of the values. Obviously, everything that the CSV parser outputs is a string. The data types I am currently interested in are:
- integer
- floating point
- date
- boolean
- string
My current thoughts are to test a sample of rows (maybe several hundred?) in order to determine the types of data present through pattern matching.
I am particularly concerned about the date data type - is their a python module for parsing common date idioms (obviously I will not be able to detect them all)?
What about integers and floats?
Dateutil comes to mind for parsing dates.
For integers and floats you could always try a cast in a try/except section
>>> f = "2.5"
>>> i = "9"
>>> ci = int(i)
>>> ci
9
>>> cf = float(f)
>>> cf
2.5
>>> g = "dsa"
>>> cg = float(g)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for float(): dsa
>>> try:
... cg = float(g)
... except:
... print "g is not a float"
...
g is not a float
>>>
这篇关于猜测当前表示为字符串的数据类型的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!