正则表达式解析HTML [英] Regular Expression to Parse HTML

查看：78 发布时间：2019/6/4 23:58:45 visual-basic-net

本文介绍了正则表达式解析HTML的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有没有人有正则表达式模式来解析流中的HTML？

我有一个结构良好的文件，其中每一行都是

< sometag someattribute =''attr''> text< / sometag>

例如

一些文字，或

只是一些文字，没有标签

我想要什么能够做的是解析每一行，这样我得到一个数组

这样的

SPAN

CLASS

myclass

一点文字

或

只是一些文字，没有标签

数组位应该跟随，但我不是自称是一个正则表达式专家（或者

任何类型的专家）。任何人都可以帮助一个合适的

模式吗？

TIA

Charles

解决方案

这对你有用吗？

http://regexplib.com/REDetails.aspx?regexp_id=520

Galin Iliev

MCSD，MCAD.NET

新闻：％2 **************** @ TK2MSFTNGP15.phx.gbl ...
有没有人有一个正则表达式模式来解析流中的HTML吗？

我有一个结构良好的文件，其中每行的格式为

< sometag someattribute =''attr' '> text< / sometag>

例如

一些文字或
我想要做的就是解析每一行，这样我就得到一个这样的数组

SPAN
CLASS
myclass
有点文字

只是一些文字，没有标签

阵列位应遵循，但我不是自称是正则表达式专家（或任何类型的专家）。任何人都可以帮助一个合适的模式吗？

TIA

Charles

" Charles Law" < BL *** @ nowhere.com> schrieb：
有没有人有一个正则表达式模式来解析流中的HTML？

我有一个结构良好的文件，其中每一行都是

< sometag someattribute =''attr''> text< / sometag>

例如

一些文字，或
只是一些文字，没有标签

我想要做的是解析每一行，以便我得到一个
这样的阵列

SPAN
CLASS
myclass
一些文字

也许它''更容易使用HTML Agility Pack：

..NET Html Agility Pack：如何使用格式错误的HTML就好像它是b / b
格式良好的XML。 ..

< URL：http：//blogs.msdn.com/smourier/archive/2003/06/04/8265.aspx>

下载：

< URL：http：//www.codefluent.com/smourier/download/htmlagilitypack.zip>

-

MS Herfried K. Wagner

MVP< URL：http：//dotnet.mvps.org/>

VB< URL：http：//classicvb.org / petition />

Hi Galin

感谢您的链接。它看起来应该可以工作，但是当我测试它时，即使是一个简单的标签它也不会返回
，它不返回任何匹配项。我尝试用Expresso验证

表达式，它给出了以下错误。

参考未定义的组号5.

即使我使用网站上的设施测试它也会失败。任何想法

如何纠正它？

Charles

" Galin Iliev" < iliev@_NOSPAM_.Galcho.com>在留言中写道

新闻：％2 **************** @ TK2MSFTNGP10.phx.gbl ...
这是usefult for you？

http：// regexplib。 com / REDetails.aspx？regexp_id = 520

Galin Iliev
MCSD，MCAD.NET

Charles Law < BL *** @ nowhere.com>在消息中写道
新闻：％2 **************** @ TK2MSFTNGP15.phx.gbl ...
有没有人有正则表达式模式从流解析HTML？

我有一个结构良好的文件，其中每一行都是

< sometag someattribute =''attr''> text< ; / sometag>

例如

一些文字，或
只是一些文字，没有标签

我希望能够做的是解析每一行，这样我就得到了这样的数组

SPAN
CLASS
myclass
一些文字

只是一些文字，没有标签

数组位应该跟随，但我不是自称是一名正则表达专家（或任何类型的专家）。任何人都可以帮助一个合适的
模式吗？

TIA

Does anyone have a regex pattern to parse HTML from a stream?

I have a well structured file, where each line is of the form

<sometag someattribute=''attr''>text</sometag>

for example

A bit of text, or
Just some text, without tags

What I would like to be able to do is parse each line so that I get an array
like this

SPAN
CLASS
myclass
A bit of text

or

Just some text, without tags

The array bit should follow, but I don''t profess to be a regex expert (or
any kind of expert for that matter). Can anyone help with a suitable
pattern?

TIA

Charles

解决方案

is this usefult for you?

http://regexplib.com/REDetails.aspx?regexp_id=520

Galin Iliev
MCSD, MCAD.NET

"Charles Law" <bl***@nowhere.com> wrote in message
news:%2****************@TK2MSFTNGP15.phx.gbl...
Does anyone have a regex pattern to parse HTML from a stream?

I have a well structured file, where each line is of the form

<sometag someattribute=''attr''>text</sometag>

for example

A bit of text, or
Just some text, without tags

What I would like to be able to do is parse each line so that I get an
array like this

SPAN
CLASS
myclass
A bit of text

or

Just some text, without tags

The array bit should follow, but I don''t profess to be a regex expert (or
any kind of expert for that matter). Can anyone help with a suitable
pattern?

TIA

Charles

"Charles Law" <bl***@nowhere.com> schrieb:
Does anyone have a regex pattern to parse HTML from a stream?

I have a well structured file, where each line is of the form

<sometag someattribute=''attr''>text</sometag>

for example

A bit of text, or
Just some text, without tags

What I would like to be able to do is parse each line so that I get an
array like this

SPAN
CLASS
myclass
A bit of text

Maybe it''s easier to use the HTML Agility Pack:

..NET Html Agility Pack: How to use malformed HTML just like it was
well-formed XML...
<URL:http://blogs.msdn.com/smourier/archive/2003/06/04/8265.aspx>

Download:

<URL:http://www.codefluent.com/smourier/download/htmlagilitypack.zip>

--
M S Herfried K. Wagner
M V P <URL:http://dotnet.mvps.org/>
V B <URL:http://classicvb.org/petition/>

Hi Galin

Thanks for the link. It looks like it ought to work, but when I test it
against even a simple tag it returns no matches. I tried verifying the
expression with Expresso and it gives the following error.

Reference to undefined group number 5.

Even when I test it using the facility on the web site it fails. Any idea
how to correct it?

Charles
"Galin Iliev" <iliev@_NOSPAM_.Galcho.com> wrote in message
news:%2****************@TK2MSFTNGP10.phx.gbl...
is this usefult for you?

http://regexplib.com/REDetails.aspx?regexp_id=520

Galin Iliev
MCSD, MCAD.NET

"Charles Law" <bl***@nowhere.com> wrote in message
news:%2****************@TK2MSFTNGP15.phx.gbl...
Does anyone have a regex pattern to parse HTML from a stream?

I have a well structured file, where each line is of the form

<sometag someattribute=''attr''>text</sometag>

for example

A bit of text, or
Just some text, without tags

What I would like to be able to do is parse each line so that I get an
array like this

SPAN
CLASS
myclass
A bit of text

or

Just some text, without tags

The array bit should follow, but I don''t profess to be a regex expert (or
any kind of expert for that matter). Can anyone help with a suitable
pattern?

TIA

Charles

这篇关于正则表达式解析HTML的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

正则表达式解析HTML [英] Regular Expression to Parse HTML

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

正则表达式解析HTML [英] Regular Expression to Parse HTML

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭