如何检查正则表达式是否完全匹配字符串,即字符串不包含任何额外的字符? [英] How to check that a regular expression has matched a string completely, i.e. - the string did not contain any extra character?

查看:126
本文介绍了如何检查正则表达式是否完全匹配字符串,即字符串不包含任何额外的字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个问题:



1)我有一个正则表达式([AZ] [az] {0,2})( \d *),我使用Python的 re.finditer()来匹配适当的字符串。我的问题是,我想只匹配不包含额外字符的字符串,否则我想引发异常。



我想抓住以下模式:
- 大写字母,后跟0,1或2个小写字母,后跟0个或更多个数字。 p>

模式表示化学公式,即原子后跟数字的发生。我想将原子放入一个字典,它的数目是发生的,所以我需要分离原子(大写字母后跟0,1或2个小写字母)和数字,但请记住,它们属于一起。



示例:

  C6H5Fe2I#此字符串应该成功匹配。结果:C6 H5 Fe2 I 
H2TeO4#这个字符串应该匹配成功结果:H2 Te O4
H3PoooO5#异常应该被提高
C2tH6#异常应该被提高

2)第二个问题是如果输入字符串错误,我应该提供什么样的异常。



谢谢Tomas

解决方案

这里有几种不同的方法可以使用:



比较长度




  • 查找原始字符串的长度。 / li>
  • 将匹配的字符串的长度相加。

  • 如果两个数字不同,则有未使用的字符。



请注意,您也可以合并这个方法用你现有的代码,而不是做一个额外的步骤,如果你想避免解析字符串两次。



整个字符串的正则表达式



您可以检查此正则表达式是否与整个字符串相匹配:

  ^([AZ] [az] {0,2} \d *)* $ 

(Rubular)



令牌化



您可以使用以下正则表达式对原始字符串进行标记:

  [AZ] [^ AZ] * 

然后检查每个令牌,看看是否它符合您原来的正则表达式。


I have two questions:

1) I have a regular expression ([A-Z][a-z]{0,2})(\d*) and I am using Python's re.finditer() to match appropriate strings. My problem is, that I want to match only strings that contain no extra characters, otherwise I want to raise an exception.

I want to catch a following pattern: - capital letter, followed by 0, 1 or 2 small letters, followed by 0 or more numbers.

The pattern represents a chemical formula, i.e. atom followed by number of it's occurences. I want to put the atom into a dictionary with it's number of occurences, so I need to separate atoms (capital letter followed by 0, 1 or 2 small letters) and numbers, but remember that they belong together.

Example:

C6H5Fe2I   # this string should be matched successfully. Result: C6 H5 Fe2 I
H2TeO4     # this string should be matched successfully Result: H2 Te O4
H3PoooO5   # exception should be raised
C2tH6      # exception should be raised

2) second question is what kind of Exception should I raise in case the input string is wrong.

Thank you, Tomas

解决方案

Here's a few different approaches you could use:

Compare lengths

  • Find the length of the original string.
  • Sum the length of the matched strings.
  • If the two numbers differ there were unused characters.

Note that you can also combine this method with your existing code rather than doing it as an extra step if you want to avoid parsing the string twice.

Regular expression for entire string

You can check if this regular expression matches the entire string:

^([A-Z][a-z]{0,2}\d*)*$

(Rubular)

Tokenize

You can use the following regular expression to tokenize the original string:

[A-Z][^A-Z]*

Then check each token to see if it matches your original regular expression.

这篇关于如何检查正则表达式是否完全匹配字符串,即字符串不包含任何额外的字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆