python:从字符串模板中提取变量 [英] python: extracting variables from string templates
问题描述
我熟悉使用 模板将变量插入字符串的能力,像这样:
Template('value 在 $min 和 $max 之间').substitute(min=5, max=10)
我现在想知道是否有可能做相反的事情.我想获取一个字符串,并使用模板从中提取值,以便我有一些包含提取值的数据结构(最好只是命名变量,但 dict 很好).例如:
<预><代码>>>>string = '值在 5 到 10 之间'>>>d = Backwards_template('值在 $min 和 $max 之间').extract(string)>>>打印{'min':'5','max':'10'}这可能吗?
这就是所谓的 正则表达式:
导入重新string = '值在 5 到 10 之间'm = re.match(r'value 介于 (.*) 和 (.*)' 之间,字符串)打印(m.group(1),m.group(2))
输出:
5 10
<小时>
更新 1. 可以给组命名:
m = re.match(r'value is between (?P.*) and (?P.*)', string)打印(m.group('min'),m.group('max'))
但是这个特性并不经常使用,因为通常有一个更重要的方面有足够的问题:如何准确地捕捉你想要的东西(对于这种特殊情况,这没什么大不了的,但即使在这里:如果字符串是value is between 1 and 2 and 3
-- 是否应该接受字符串,min
和 max
是什么?)
更新 2. 与制作精确的正则表达式相比,有时将正则表达式和常规"代码组合起来更容易,如下所示:
m = re.match(r'value is between (?P.*) and (?P.*)', string)尝试:value_min = float(m.group('min'))value_max = float(m.group('max'))except (AttributeError, ValueError): # 不匹配或转换失败value_min = 无value_max = 无
当您的文本由许多要处理的块(如不同类型引号中的短语)组成时,这种组合方法尤其值得记住:在棘手的情况下,定义一个正则表达式来处理块的分隔符和内容比定义一个正则表达式更难定义几个步骤,如 text.split()
、可选的块合并以及每个块的独立处理(使用正则表达式和其他方式).
I am familiar with the ability to insert variables into a string using Templates, like this:
Template('value is between $min and $max').substitute(min=5, max=10)
What I now want to know is if it is possible to do the reverse. I want to take a string, and extract the values from it using a template, so that I have some data structure (preferably just named variables, but a dict is fine) that contains the extracted values. For example:
>>> string = 'value is between 5 and 10'
>>> d = Backwards_template('value is between $min and $max').extract(string)
>>> print d
{'min': '5', 'max':'10'}
Is this possible?
That's called regular expressions:
import re
string = 'value is between 5 and 10'
m = re.match(r'value is between (.*) and (.*)', string)
print(m.group(1), m.group(2))
Output:
5 10
Update 1. Names can be given to groups:
m = re.match(r'value is between (?P<min>.*) and (?P<max>.*)', string)
print(m.group('min'), m.group('max'))
But this feature is not used often, as there are usually enough problems with a more important aspect: how to capture exactly what you want (with this particular case that's not a big deal, but even here: what if the string is value is between 1 and 2 and 3
-- should the string be accepted and what's the min
and max
?).
Update 2. Rather than making a precise regex, it's sometimes easier to combine regular expressions and "regular" code like this:
m = re.match(r'value is between (?P<min>.*) and (?P<max>.*)', string)
try:
value_min = float(m.group('min'))
value_max = float(m.group('max'))
except (AttributeError, ValueError): # no match or failed conversion
value_min = None
value_max = None
This combined approach is especially worth remembering when your text consists of many chunks (like phrases in quotes of different types) to be processed: in tricky cases, it's harder to define a single regex to handle both delimiters and contents of chunks than to define several steps like text.split()
, optional merging of chunks, and independent processing of each chunk (using regexes and other means).
这篇关于python:从字符串模板中提取变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!