如何从以字符串表示的数字中检索Python格式的代码? [英] How to retrieve Python format code from number represented as string?

查看:90
本文介绍了如何从以字符串表示的数字中检索Python格式的代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将数字数据存储在ASCII txt文件中,即不同参数的值以及每个参数的列.列之间的格式可能有所不同,但在列内不会更改. 我将该数据加载到Python中,进行处理并将其写回到ASCII文件中. 问题是: 数字的格式不应更改.表示小数位应保持相同,exp表示法仍应为expnotation,依此类推.因此,我需要的是一个函数,该函数返回表示一个数字的每个字符串的格式代码(然后可以在处理过程中将其与数字一起存储). 注意:参数类型在处理期间不会更改;也就是说,整数将保持为整数,浮点数将保持浮点数等(否则,保持格式代码没有多大意义).

I have numeric data stored in ASCII txt files, i.e. values of different parameters with a column for each parameter. The format might be different between columns but does not change within a column. I load that data into Python, process it and write it back to ASCII files. The thing is: the format of the numbers should not change. Meaning that decimal places should still be the same, exp notation should still be exp notation and so on. So what I need is a function that returns format codes for each string that represents a number (which I can then store alongside the numbers during processing). Note: parameter types won't change during processing; i.e. integers will stay integers, floats stay floats etc. (otherwise, keeping the format code woudn't make much sense).

我的想法是使用正则表达式分析字符串,以确定它是否为int,float,float以指数表示法等:

My idea would be to use regex to analyse the string, to determine if it is an int, float, float in exponential notation etc.:

import re
string = '3.142'
# positive match then detected as
match = re.fullmatch(r'[+|-]*[0-9]+[.][0-9]*', string.strip())

按照此一般分类,我将解析字符串以确定例如小数位.例如

Following this general classification, I'd parse the string to determine e.g. decimal places. For example

string = '3.142' # I know from above that it is a float and not exp notation...
lst = string.strip().split('.')
if not lst[1]: # trailing zeros are hidden
    result = '{:+g}' if '+' in lst[0] else '{:g}'
else:
    result = '{0:+.' if '+' in lst[0] else '{0:.'
    result += str(len(lst[1])) + 'f}'

print(result) # gives... '{0:.3f}'

问: 这似乎很笨拙. -有人有更好的解决方案吗?

Q: This seems like a rather clumsy approach; - Anybody have a better solution?

推荐答案

思考了一段时间后,我对自己的问题的回答:由于缺少信息,这是一种不可能的反转.

My answer to my own question, after thinking about the issue for some time: It is kind of an impossible inversion due to lacking information.

示例.假设您读取了字符串"-5.5".如果数字的精度为1位或尾随的零被隐藏,则您已经缺乏信息.另一个(非数字)问题是您不知道它是否为有符号"值,即,如果它为正数,则是否为"+5.5".想要更多?以"1.2E + 1"为例.可能是整数12.尽管不太可能,但您不确定.

Example. Suppose you read a string '-5.5'. Then you already lack the information if the number has 1 digit of precision or if trailing zeros are just hidden. Another (non-numeric) issue would be that you don’t know if it is a 'signed' value, i.e. if it would be '+5.5' if it was a positive number. Want more? Take '1.2E+1' for example. This could have been integer 12. Although unlikely, you can’t be sure.

此外,Python方面还有一些小的限制,例如据我所知,{:E}.format()总是会生成一个带符号的,零填充的2位数指数(如果它当然小于100),即像'... E + 01',尽管您可能想要'. ..E + 1'.关于数字格式的另一件事是隐藏的前导零和尾随零,例如我的问题此处.普通的字符串格式设置选项似乎不包括删除前导/后缀零,您需要其他帮助器,例如.lstrip(0).

Besides that there are some minor limitations on the Python side, like e.g. as far as I know, {:E}.format() will always generate a signed, zero-padded, 2-digit exponent (if it is <100 of course), i.e. like '...E+01' although you might want '...E+1'. Another thing about number formatting are hidden leading and trailing zeros, see e.g. my question here. Removing leading/trailing zeros just seems not to be included in normal string formatting options – you need additional helpers like .lstrip("0").

我想出的办法在返回格式代码从字符串到数字再到字符串的过程中至少起到了不错的作用.使用regex进行一般分类,然后使用简单的.split()等.

What I came up with that does at least a decent job in returning format codes to go from string to number and back to string. Uses a little bit of regex for a general classification and then simple .split() etc.

import re
class NumStr():
    def analyse_format(self, s, dec_sep='.'):
        """
        INPUT: 
            s, string, representing a number
        INPUT, optional: 
            dec_sep, string, decimal separator
        WHAT IT DOES:
            1) analyse the string to achieve a general classification
                (decimal, no decimal, exp notation)
            2) pass the string and the general class to an appropriate
                parsing function.
        RETURNS: 
            the result of the parsing function:
                tuple with
                    format code to be used in '{}.format()'
                    suited Python type for the number, int or float.
        """
        # 1. format definitions. key = general classification.
        redct = {'dec': '[+-]?[0-9]+['+dec_sep+'][0-9]*|[+-]?[0-9]*['+dec_sep+'][0-9]+',
                 'no_dec': '[+-]?[0-9]+',
                 'exp_dec': '[+-]?[0-9]+['+dec_sep+'][0-9]*[eE][+-]*[0-9]+',
                 'exp_no_dec': '[+-]?[0-9]+[eE][+-]*[0-9]+'}
        # 2. analyse the format to find the general classification.
        gen_class, s = [], s.strip()
        for k, v in redct.items():
            test = re.fullmatch(v, s)
            if test:
                gen_class.append(k)
        if not gen_class:
            raise TypeError("unknown format -->", s)
        elif len(gen_class) > 1:
            raise TypeError("ambiguous result -->", s, gen_class)
        # 3. based on the general classification, call string parsing function
        method_name = 'parse_' + str(gen_class[0])
        method = getattr(self, method_name, lambda *args: "Undefined Format!")
        return method(s, *dec_sep)

    def parse_dec(self, s, dec_sep):
        lst = s.split(dec_sep)
        result = '{:f}' if len(lst[1]) == 0 else '{:.'+str(len(lst[1]))+'f}'
        result = result.replace(':', ':+') if '+' in lst[0] else result
        return (result, float)

    def parse_no_dec(self, s, *dec_sep):
        result = '{:+d}' if '+' in s else '{:d}'
        return (result, int)

    def parse_exp_dec(self, s, dec_sep):
        lst_dec = s.split(dec_sep)
        lst_E = lst_dec[1].upper().split('E')
        result = '{:.'+str(len(lst_E[0]))+'E}'
        result = result.replace(':', ':+') if '+' in lst_dec[0] else result
        return (result, float)

    def parse_exp_no_dec(self, s, *dec_sep):
        lst_E = s.upper().split('E')
        result = '{:+E}' if '+' in lst_E[0] else '{:E}'
        return (result, float)

并进行测试:

valid = ['45', '45.', '3E5', '4E+5', '3E-3', '2.345E+7', '-7',
         '-45.3', '-3.4E3', ' 12 ', '8.8E1', '+5.3', '+4.',
         '+10', '+2.3E121', '+4e-3','-204E-9668','.7','+.7']
invalid = ['tesT', 'Test45', '7,7E2', '204-100', '.']

如果您有任何改进的想法,我很乐意将它们包括在内!我想人们已经遇到了这个问题.

If you have any ideas for improvement, I'm happy to include them! I guess people already came across this issue.

这篇关于如何从以字符串表示的数字中检索Python格式的代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆