如何使用正则表达式提取短语和分号之间的文本 [英] How to extract text between a phrase and a semicolon using regex

查看:106
本文介绍了如何使用正则表达式提取短语和分号之间的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要提取特定值的文本文件中有多个文本行.我刚刚开始学习 RegEx 并尝试在这种情况下使用它.要提取的值是数字,但可以是整数或具有不同小数位的小数.

I have a multiple text rows in a text file which I need to extract out particular values. I have just started learning RegEx and was trying my hand at using it for this situation. The values that are to be extracted are digits but can be either integer or decimal with varying decimal places.

下面显示了文本行的两个示例.

Two examples of the text rows are shown below.

settings parameterName1 = 15.0;
settings parameterName2 = 75.0; # Increase 25% from 50.0;

下面的 RegEx 字符串适用于第一个文本行,但不适用于第二个文本行.

The RegEx string below works for the first text row but not for the second text row.

(?<=\bsettings.*\=\s).*(?=\;)\b

我从 RegEx 字符串中得到的结果如下所示 - 第二行没有只输出我正在寻找的数字值(即我希望第一行看到 15.0,第二行看到 75.0 而不是获取#评论文本).

The results that I get from the RegEx string are shown below - the second row did not output only the digit values I was looking for (i.e. I expected to see 15.0 for the first row and 75.0 for the second row only and not get the # comment text).

15.0;
75.0; # Increase 25% from 50.0;

非常感谢您的帮助.

推荐答案

我从 RegEx 字符串中得到的结果如下所示

The results that I get from the RegEx string are shown below

这是因为.*贪婪.当它可以选择停止匹配或继续匹配时,它会尝试匹配尽可能多的字符.

This is because .* is greedy. When it has an option to stop matching or to continue matching, it will try to match as many characters as possible.

一个简单的解决方法是添加一个不情愿的限定符 ?.* - 即

An easy fix is to add a reluctant qualifier ? to .* - i.e.

(?<=\bsettings.*\=\s).*?(?=\;)\b

更好的解决方法是将 . 替换为 [^;],这样也可以防止回溯:

A better fix would be to replace . with [^;], which would also prevent backtracking:

(?<=\bsettings.*\=\s)[^;]*(?=\;)\b

这篇关于如何使用正则表达式提取短语和分号之间的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆