正则表达式提取数维 [英] Regular expression extracting number dimension

查看:111
本文介绍了正则表达式提取数维的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用python正则表达式从数据库中提取维度信息.该列中的条目如下所示:

I'm using python regular expressions to extract dimensional information from a database. The entries in that column look like this:

23 cm
43 1/2 cm

20cm
15 cm x 30 cm

我需要的只是条目的宽度(因此,对于带有"x"的条目,只有第一个数字),但是正如您所看到的,这些值遍地都是.

What I need from this is only the width of the entry (so for the entries with an 'x', only the first number), but as you can see the values are all over the place.

根据我在文档中的理解,您可以使用它们的位置来访问匹配的组,所以我想我可以根据返回的组数和在每个索引中找到的内容来确定条目的类型.

From what I understood in the documentation, you can access the groups in a match using their position, so I was thinking I could determine the type of the entry based on how many groups are returned and what is found at each index.

到目前为止,我使用的表达式是^(\d{2})\s?(x\s?(\d{2}))?(\d+/\d+)?$,但是它并不完美,它返回了许多无用的组.有什么更有效,更合适的方法吗?

The expression I used so far is ^(\d{2})\s?(x\s?(\d{2}))?(\d+/\d+)?$, however it's not perfect and it returns a number of useless groups. Is there something more efficient and appropriate?

修改:我需要每一行的电话号码.如果只有一个数字,则意味着仅测量宽度(包括任何小数部分,如第2行).当有两个数字时,也要测量高度,但是我只需要宽度是第一个数字(例如最后一行)

Edit: I need the number from every line. When there is only one number, it is implied that only the width was measured (including any fractional components such as line 2). When there are two numbers, the height was also measured, but I only need the width which is the first number (such as in the last line)

推荐答案

在下面尝试正则表达式,它将捕获第一个数字,并在第一个"cm"之前捕获可选的小数

try regex below, it will capture 1st digits and optional fractional come after it before the 1st 'cm'

import re
regex = re.compile('(\d+.*?)\s?cm') # this will works for all your example data
# or
# this asserted whatever come after the 1st digit group must be fractional number only
regex = re.compile('(\d+(?:\s+\d+\/\d+)?)\s?cm') 


>>> regex.match('23 cm').group(1)
>>> '23' 
>>> regex.match('43 1/2 cm').group(1)
>>> '43 1/2'
>>> regex.match('20cm').group(1)
>>> '20'
>>> regex.match('15 cm x 30 cm').group(1)
>>> '15'

regex101演示

这篇关于正则表达式提取数维的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆