确定“值的类型"来自python中的字符串 [英] determine "type of value" from a string in python

查看:24
本文介绍了确定“值的类型"来自python中的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试用 python 编写一个函数,它将确定字符串中的值的类型;例如

I'm trying to write a function in python, which will determine what type of value is in string; for example

如果字符串中的值为 1 或 0 或 True 或 False,则值为 BIT

if in string is 1 or 0 or True or False the value is BIT

如果字符串是0-9*,则值为INT

if in string is 0-9*, the value is INT

如果字符串中的值为 0-9+.0-9+,则值为浮点

if in string is 0-9+.0-9+ the value is float

如果字符串是 stg more(文本等)值是文本

if in string is stg more (text, etc) value is text

到目前为止我有像

def dataType(string):

 odp=''
 patternBIT=re.compile('[01]')
 patternINT=re.compile('[0-9]+')
 patternFLOAT=re.compile('[0-9]+\.[0-9]+')
 patternTEXT=re.compile('[a-zA-Z0-9]+')
 if patternTEXT.match(string):
     odp= "text"
 if patternFLOAT.match(string):
     odp= "FLOAT"
 if patternINT.match(string):
     odp= "INT"
 if patternBIT.match(string):
     odp= "BIT"

 return odp 

但是我不是很擅长在 python 中使用正则表达式..你能告诉我,我做错了什么吗?例如,它不适用于 2010-00-10,它应该是 Text,而是 INT 或 20.90,它应该是浮点但是 int

But i'm not very skilled in using regexes in python..could you please tell, what am i doing wrong? For example it doesn't work for 2010-00-10 which should be Text, but is INT or 20.90, which should be float but is int

推荐答案

在您走上正则表达式路线之前,您是否考虑过使用 ast.literal_eval

Before you go too far down the regex route, have you considered using ast.literal_eval

示例:

In [35]: ast.literal_eval('1')
Out[35]: 1

In [36]: type(ast.literal_eval('1'))
Out[36]: int

In [38]: type(ast.literal_eval('1.0'))
Out[38]: float

In [40]: type(ast.literal_eval('[1,2,3]'))
Out[40]: list

也可以用Python帮你解析一下!

May as well use Python to parse it for you!

好的,这是一个更大的例子:

OK, here is a bigger example:

import ast, re
def dataType(str):
    str=str.strip()
    if len(str) == 0: return 'BLANK'
    try:
        t=ast.literal_eval(str)

    except ValueError:
        return 'TEXT'
    except SyntaxError:
        return 'TEXT'

    else:
        if type(t) in [int, long, float, bool]:
            if t in set((True,False)):
                return 'BIT'
            if type(t) is int or type(t) is long:
                return 'INT'
            if type(t) is float:
                return 'FLOAT'
        else:
            return 'TEXT' 



testSet=['   1  ', ' 0 ', 'True', 'False',   #should all be BIT
         '12', '34l', '-3','03',              #should all be INT
         '1.2', '-20.4', '1e66', '35.','-   .2','-.2e6',      #should all be FLOAT
         '10-1', 'def', '10,2', '[1,2]','35.9.6','35..','.']

for t in testSet:
    print "{:10}:{}".format(t,dataType(t))

输出:

   1      :BIT
 0        :BIT
True      :BIT
False     :BIT
12        :INT
34l       :INT
-3        :INT
03        :INT
1.2       :FLOAT
-20.4     :FLOAT
1e66      :FLOAT
35.       :FLOAT
-   .2    :FLOAT
-.2e6     :FLOAT
10-1      :TEXT
def       :TEXT
10,2      :TEXT
[1,2]     :TEXT
35.9.6    :TEXT
35..      :TEXT
.         :TEXT

如果你肯定必须有一个正则表达式解决方案,它会产生相同的结果,这里是:

And if you positively MUST have a regex solution, which produces the same results, here it is:

def regDataType(str):
    str=str.strip()
    if len(str) == 0: return 'BLANK'

    if re.match(r'True$|^False$|^0$|^1$', str):
        return 'BIT'
    if re.match(r'([-+]\s*)?\d+[lL]?$', str): 
        return 'INT'
    if re.match(r'([-+]\s*)?[1-9][0-9]*\.?[0-9]*([Ee][+-]?[0-9]+)?$', str): 
        return 'FLOAT'
    if re.match(r'([-+]\s*)?[0-9]*\.?[0-9][0-9]*([Ee][+-]?[0-9]+)?$', str): 
        return 'FLOAT'

    return 'TEXT' 

然而,我不能推荐正则表达式而不是 ast 版本;让 Python 来解释它认为这些数据类型是什么,而不是用正则表达式来解释它们......

I cannot recommend the regex over the ast version however; just let Python do the interpretation of what it thinks these data types are rather than interpret them with a regex...

这篇关于确定“值的类型"来自python中的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆