从 Python 3 中的给定字符串解析测量值(多维) [英] Parse measurements (multiple dimensions) from a given string in Python 3

查看：43 发布时间：2021/6/14 19:35:05 regex python-3.x parsing units-of-measurement ner

本文介绍了从 Python 3 中的给定字符串解析测量值(多维)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我知道这篇文章和这个库但他们在下面的这些特定情况下没有帮助我.我如何解析如下测量值:

我有如下字符串；

方形 10 x 3 x 5 毫米"第 23/22 轮；24,9 x 12,2 x 12,3"正方形 10x2"直10x2mm"

我正在寻找一个 Python 包或某种方式来获得如下结果；

<预><代码>>>>a =amazing_parser.parse("正方形 10 x 3 x 5 毫米")>>>打印(一)10 x 3 x 5 毫米

同样；

<预><代码>>>>a =amazing_parser.parse("Round 23/22; 24,9x12,2")>>>打印(一)24,9 x 12,2

我也尝试使用命名实体识别" 使用 "ner_ontonotes_bert_mult" 模型.但结果如下:

<预><代码>>>>从 deeppavlov 导入配置，build_model>>>ner_model = build_model(configs.ner.ner_ontonotes_bert_mult，下载=真)>>>打印(ner_model([第 23/22 轮；24,9 x 12,2 x 12,3"]))<class 'list'>: [[['Round', '23', '/', '22', ';', '24', ',', '9', 'x', '12', ',', '2', 'x', '12', ',', '3']], [['O', 'B-CARDINAL', 'O', 'B-CARDINAL','O', 'B-基数', 'O', 'B-基数', 'O', 'B-基数', 'O', 'B-基数', 'O', 'B-基数','O', 'B-CARDINAL']]]]

我不知道如何正确地从这个列表中提取这些测量值.

我还发现了这个正则表达式:

>>>re.findall("(\d+(?:,\d+)?) x (\d+(?:,\d+)?)(?: x (\d+(?:,\d+)?))?", "直 10 x 2 毫米")<class 'list'>: [('10', '2', '')]

但如果输入包含 2 个维度，它会在结果列表中留下一个空值，如果数字和x"之间没有空格，它就不起作用.我不擅长正则表达式...

解决方案

对于给定的示例，您可以使用:

(?

部分

(?<!\S) 负向后视，断言左边的不是非空白字符
\d+(?:,\d+)? 匹配 1+ 个数字和可选的 , 和 1+ 个数字
?x ? 在可选空格之间匹配x
\d+(?:,\d+)? 匹配 1+ 个数字和可选的 , 和 1+ 个数字
(?: 非捕获组 ?x ?\d+匹配x` 可选空格和 1+ 位数字 (?:,\d+)? 可选择匹配一个 , 和 1+ 个数字
)* 关闭非捕获组并重复 0+ 次

正则表达式演示 |Python 演示

例如

导入重新正则表达式 = r"(?

输出

['10 x 3 x 5', '24,9 x 12,2 x 12,3', '10x2', '10x2', '24,9x12,2']

I'm aware of this post and this library but they didn't help me with these specific cases below. How can I parse measurements like below:



I have strings like below;
"Square 10 x 3 x 5 mm"
"Round 23/22; 24,9 x 12,2 x 12,3"
"Square 10x2"
"Straight 10x2mm"
I'm looking for a Python package or some way to get results like below;
>>> a = amazing_parser.parse("Square 10 x 3 x 5 mm")
>>> print(a)
10 x 3 x 5 mm
Likewise;
>>> a = amazing_parser.parse("Round 23/22; 24,9x12,2")
>>> print(a)
24,9 x 12,2
I also tried to use "named entity recognition" using "ner_ontonotes_bert_mult" model. But the results were like below:
>>> from deeppavlov import configs, build_model
>>> ner_model = build_model(configs.ner.ner_ontonotes_bert_mult, download=True)
>>> print(ner_model(["Round 23/22; 24,9 x 12,2 x 12,3"]))
<class 'list'>: [[['Round', '23', '/', '22', ';', '24', ',', '9', 'x', '12', ',', '2', 'x', '12', ',', '3']], [['O', 'B-CARDINAL', 'O', 'B-CARDINAL', 'O', 'B-CARDINAL', 'O', 'B-CARDINAL', 'O', 'B-CARDINAL', 'O', 'B-CARDINAL', 'O', 'B-CARDINAL', 'O', 'B-CARDINAL']]]
I have no idea how to extract those measurements from this list properly.

I also found this regex: 
>>>re.findall("(\d+(?:,\d+)?) x (\d+(?:,\d+)?)(?: x (\d+(?:,\d+)?))?", "Straight 10 x 2 mm")
<class 'list'>: [('10', '2', '')]

But it does leave an empty value in the resulting list if the input contains 2 dimensions and it doesn't work if there is no whitespace between numbers and "x"s. I'm not good with regex...
 解决方案 
For the given examples, you might use:
(?<!\S)\d+(?:,\d+)? ?x ?\d+(?:,\d+)?(?: ?x ?\d+(?:,\d+)?)*
In parts


(?<!\S) Negative lookbehind, assert what is on the left is not a non whitespace char
\d+(?:,\d+)? Match 1+ digits and optionally a , and 1+ digits
 ?x ? Match x between optional spaces
\d+(?:,\d+)? Match 1+ digits and optionally a , and 1+ digits
(?: Non capturing group


 ?x ?\d+Matchx` between optional spaces and 1+ digits
(?:,\d+)? Optionally match a , and 1+ digits

)* Close non capturing group and repeat 0+ times


Regex demo | Python demo

For example
import re

regex = r"(?<!\S)\d+(?:,\d+)? ?x ?\d+(?:,\d+)?(?: ?x ?\d+(?:,\d+)?)*"
test_str = ("Square 10 x 3 x 5 mm\n"
    "Round 23/22; 24,9 x 12,2 x 12,3\n"
    "Square 10x2\n"
    "Straight 10x2mm\n"
    "Round 23/22; 24,9x12,2")
result = re.findall(regex, test_str)
print(result)
Output
['10 x 3 x 5', '24,9 x 12,2 x 12,3', '10x2', '10x2', '24,9x12,2']


                        
这篇关于从 Python 3 中的给定字符串解析测量值(多维)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

从 Python 3 中的给定字符串解析测量值(多维) [英] Parse measurements (multiple dimensions) from a given string in Python 3

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从 Python 3 中的给定字符串解析测量值(多维) [英] Parse measurements (multiple dimensions) from a given string in Python 3

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭