使用正则表达式来获得多个HTML标记之间的文本 [英] Using regex to get text between multiple HTML tags
本文介绍了使用正则表达式来获得多个HTML标记之间的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
使用正则表达式,我希望能够在多个DIV标签之间获取文本。例如,以下:
Using regex, I want to be able to get the text between multiple DIV tags. For instance, the following:
<div>first html tag</div>
<div>another tag</div>
将输出:
first html tag
another tag
正在使用的正则表达式只匹配我的最后一个div标签,并错过了第一个。
代码:
The regex pattern I am using only matches my last div tag and misses the first one. Code:
static void Main(string[] args)
{
string input = "<div>This is a test</div><div class=\"something\">This is ANOTHER test</div>";
string pattern = "(<div.*>)(.*)(<\\/div>)";
MatchCollection matches = Regex.Matches(input, pattern);
Console.WriteLine("Matches found: {0}", matches.Count);
if (matches.Count > 0)
foreach (Match m in matches)
Console.WriteLine("Inner DIV: {0}", m.Groups[2]);
Console.ReadLine();
}
输出:
匹配项:1
Inner DIV:这是另一个测试
Inner DIV: This is ANOTHER test
推荐答案
使用非贪婪匹配替换模式
Replace your pattern with a non greedy match
static void Main(string[] args)
{
string input = "<div>This is a test</div><div class=\"something\">This is ANOTHER test</div>";
string pattern = "<div.*?>(.*?)<\\/div>";
MatchCollection matches = Regex.Matches(input, pattern);
Console.WriteLine("Matches found: {0}", matches.Count);
if (matches.Count > 0)
foreach (Match m in matches)
Console.WriteLine("Inner DIV: {0}", m.Groups[1]);
Console.ReadLine();
}
这篇关于使用正则表达式来获得多个HTML标记之间的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文