正则表达式解析HTML [英] Regular Expression to Parse HTML

查看:78
本文介绍了正则表达式解析HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有人有正则表达式模式来解析流中的HTML?


我有一个结构良好的文件,其中每一行都是


< sometag someattribute =''attr''> text< / sometag>


例如


< SPAN CLASS =''myclass''>一些文字< / SPAN>,或

只是一些文字,没有标签


我想要什么能够做的是解析每一行,这样我得到一个数组

这样的


SPAN

CLASS

myclass

一点文字





只是一些文字,没有标签


数组位应该跟随,但我不是自称是一个正则表达式专家(或者

任何类型的专家)。任何人都可以帮助一个合适的

模式吗?


TIA


Charles

解决方案

这对你有用吗?

http://regexplib.com/REDetails.aspx?regexp_id=520

Galin Iliev

MCSD,MCAD.NET


新闻:%2 **************** @ TK2MSFTNGP15.phx.gbl ...

有没有人有一个正则表达式模式来解析流中的HTML吗?

我有一个结构良好的文件,其中每行的格式为

< sometag someattribute =''attr' '> text< / sometag>

例如

< SPAN CLASS =''myclass''>一些文字< / SPAN>或
我想要做的就是解析每一行,这样我就得到一个这样的数组

SPAN
CLASS
myclass
有点文字



只是一些文字,没有标签

阵列位应遵循,但我不是自称是正则表达式专家(或任何类型的专家)。任何人都可以帮助一个合适的模式吗?

TIA

Charles



" Charles Law" < BL *** @ nowhere.com> schrieb:

有没有人有一个正则表达式模式来解析流中的HTML?

我有一个结构良好的文件,其中每一行都是

< sometag someattribute =''attr''> text< / sometag>

例如

< SPAN CLASS =''myclass''>一些文字< / SPAN>,或
只是一些文字,没有标签

我想要做的是解析每一行,以便我得到一个
这样的阵列

SPAN
CLASS
myclass
一些文字




也许它''更容易使用HTML Agility Pack:


..NET Html Agility Pack:如何使用格式错误的HTML就好像它是b / b
格式良好的XML。 ..

< URL:http://blogs.msdn.com/smourier/archive/2003/06/04/8265.aspx>


下载:


< URL:http://www.codefluent.com/smourier/download/htmlagilitypack.zip>


-

MS Herfried K. Wagner

MVP< URL:http://dotnet.mvps.org/>

VB< URL:http://classicvb.org / petition />


Hi Galin


感谢您的链接。它看起来应该可以工作,但是当我测试它时,即使是一个简单的标签它也不会返回
,它不返回任何匹配项。我尝试用Expresso验证

表达式,它给出了以下错误。


参考未定义的组号5.


即使我使用网站上的设施测试它也会失败。任何想法

如何纠正它?


Charles

" Galin Iliev" < iliev@_NOSPAM_.Galcho.com>在留言中写道

新闻:%2 **************** @ TK2MSFTNGP10.phx.gbl ...

这是usefult for you?

http:// regexplib。 com / REDetails.aspx?regexp_id = 520

Galin Iliev
MCSD,MCAD.NET

Charles Law < BL *** @ nowhere.com>在消息中写道
新闻:%2 **************** @ TK2MSFTNGP15.phx.gbl ...

有没有人有正则表达式模式从流解析HTML?

我有一个结构良好的文件,其中每一行都是

< sometag someattribute =''attr''> text< ; / sometag>

例如

< SPAN CLASS =''myclass''>一些文字< / SPAN>,或
只是一些文字,没有标签

我希望能够做的是解析每一行,这样我就得到了这样的数组

SPAN
CLASS
myclass
一些文字



只是一些文字,没有标签

数组位应该跟随,但我不是自称是一名正则表达专家(或任何类型的专家)。任何人都可以帮助一个合适的
模式吗?

TIA




Does anyone have a regex pattern to parse HTML from a stream?

I have a well structured file, where each line is of the form

<sometag someattribute=''attr''>text</sometag>

for example

<SPAN CLASS=''myclass''>A bit of text</SPAN>, or
Just some text, without tags

What I would like to be able to do is parse each line so that I get an array
like this

SPAN
CLASS
myclass
A bit of text

or

Just some text, without tags

The array bit should follow, but I don''t profess to be a regex expert (or
any kind of expert for that matter). Can anyone help with a suitable
pattern?

TIA

Charles

解决方案

is this usefult for you?

http://regexplib.com/REDetails.aspx?regexp_id=520

Galin Iliev
MCSD, MCAD.NET

"Charles Law" <bl***@nowhere.com> wrote in message
news:%2****************@TK2MSFTNGP15.phx.gbl...

Does anyone have a regex pattern to parse HTML from a stream?

I have a well structured file, where each line is of the form

<sometag someattribute=''attr''>text</sometag>

for example

<SPAN CLASS=''myclass''>A bit of text</SPAN>, or
Just some text, without tags

What I would like to be able to do is parse each line so that I get an
array like this

SPAN
CLASS
myclass
A bit of text

or

Just some text, without tags

The array bit should follow, but I don''t profess to be a regex expert (or
any kind of expert for that matter). Can anyone help with a suitable
pattern?

TIA

Charles



"Charles Law" <bl***@nowhere.com> schrieb:

Does anyone have a regex pattern to parse HTML from a stream?

I have a well structured file, where each line is of the form

<sometag someattribute=''attr''>text</sometag>

for example

<SPAN CLASS=''myclass''>A bit of text</SPAN>, or
Just some text, without tags

What I would like to be able to do is parse each line so that I get an
array like this

SPAN
CLASS
myclass
A bit of text



Maybe it''s easier to use the HTML Agility Pack:

..NET Html Agility Pack: How to use malformed HTML just like it was
well-formed XML...
<URL:http://blogs.msdn.com/smourier/archive/2003/06/04/8265.aspx>

Download:

<URL:http://www.codefluent.com/smourier/download/htmlagilitypack.zip>

--
M S Herfried K. Wagner
M V P <URL:http://dotnet.mvps.org/>
V B <URL:http://classicvb.org/petition/>


Hi Galin

Thanks for the link. It looks like it ought to work, but when I test it
against even a simple tag it returns no matches. I tried verifying the
expression with Expresso and it gives the following error.

Reference to undefined group number 5.

Even when I test it using the facility on the web site it fails. Any idea
how to correct it?

Charles
"Galin Iliev" <iliev@_NOSPAM_.Galcho.com> wrote in message
news:%2****************@TK2MSFTNGP10.phx.gbl...

is this usefult for you?

http://regexplib.com/REDetails.aspx?regexp_id=520

Galin Iliev
MCSD, MCAD.NET

"Charles Law" <bl***@nowhere.com> wrote in message
news:%2****************@TK2MSFTNGP15.phx.gbl...

Does anyone have a regex pattern to parse HTML from a stream?

I have a well structured file, where each line is of the form

<sometag someattribute=''attr''>text</sometag>

for example

<SPAN CLASS=''myclass''>A bit of text</SPAN>, or
Just some text, without tags

What I would like to be able to do is parse each line so that I get an
array like this

SPAN
CLASS
myclass
A bit of text

or

Just some text, without tags

The array bit should follow, but I don''t profess to be a regex expert (or
any kind of expert for that matter). Can anyone help with a suitable
pattern?

TIA

Charles




这篇关于正则表达式解析HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆