正则表达式解析HTML [英] Regular Expression to Parse HTML
问题描述
有没有人有正则表达式模式来解析流中的HTML?
我有一个结构良好的文件,其中每一行都是
< sometag someattribute =''attr''> text< / sometag>
例如
< SPAN CLASS =''myclass''>一些文字< / SPAN>,或
只是一些文字,没有标签
我想要什么能够做的是解析每一行,这样我得到一个数组
这样的
SPAN
CLASS
myclass
一点文字
或
只是一些文字,没有标签
数组位应该跟随,但我不是自称是一个正则表达式专家(或者
任何类型的专家)。任何人都可以帮助一个合适的
模式吗?
TIA
Charles
这对你有用吗?
http://regexplib.com/REDetails.aspx?regexp_id=520
Galin Iliev
MCSD,MCAD.NET
新闻:%2 **************** @ TK2MSFTNGP15.phx.gbl ...有没有人有一个正则表达式模式来解析流中的HTML吗?
我有一个结构良好的文件,其中每行的格式为
< sometag someattribute =''attr' '> text< / sometag>
例如
< SPAN CLASS =''myclass''>一些文字< / SPAN>或
我想要做的就是解析每一行,这样我就得到一个这样的数组
SPAN
CLASS
myclass
有点文字
只是一些文字,没有标签
阵列位应遵循,但我不是自称是正则表达式专家(或任何类型的专家)。任何人都可以帮助一个合适的模式吗?
TIA
Charles
" Charles Law" < BL *** @ nowhere.com> schrieb:有没有人有一个正则表达式模式来解析流中的HTML?
我有一个结构良好的文件,其中每一行都是
< sometag someattribute =''attr''> text< / sometag>
例如
< SPAN CLASS =''myclass''>一些文字< / SPAN>,或
只是一些文字,没有标签
我想要做的是解析每一行,以便我得到一个
这样的阵列
SPAN
CLASS
myclass
一些文字
也许它''更容易使用HTML Agility Pack:
..NET Html Agility Pack:如何使用格式错误的HTML就好像它是b / b
格式良好的XML。 ..
< URL:http://blogs.msdn.com/smourier/archive/2003/06/04/8265.aspx>
下载:
< URL:http://www.codefluent.com/smourier/download/htmlagilitypack.zip>
-
MS Herfried K. Wagner
MVP< URL:http://dotnet.mvps.org/>
VB< URL:http://classicvb.org / petition />
Hi Galin
感谢您的链接。它看起来应该可以工作,但是当我测试它时,即使是一个简单的标签它也不会返回
,它不返回任何匹配项。我尝试用Expresso验证
表达式,它给出了以下错误。
参考未定义的组号5.
即使我使用网站上的设施测试它也会失败。任何想法
如何纠正它?
Charles
" Galin Iliev" < iliev@_NOSPAM_.Galcho.com>在留言中写道
新闻:%2 **************** @ TK2MSFTNGP10.phx.gbl ...这是usefult for you?
http:// regexplib。 com / REDetails.aspx?regexp_id = 520
Galin Iliev
MCSD,MCAD.NET
Charles Law < BL *** @ nowhere.com>在消息中写道
新闻:%2 **************** @ TK2MSFTNGP15.phx.gbl ...有没有人有正则表达式模式从流解析HTML?
我有一个结构良好的文件,其中每一行都是
< sometag someattribute =''attr''> text< ; / sometag>
例如
< SPAN CLASS =''myclass''>一些文字< / SPAN>,或
只是一些文字,没有标签
我希望能够做的是解析每一行,这样我就得到了这样的数组
SPAN
CLASS
myclass
一些文字
只是一些文字,没有标签
数组位应该跟随,但我不是自称是一名正则表达专家(或任何类型的专家)。任何人都可以帮助一个合适的
模式吗?
TIA
Does anyone have a regex pattern to parse HTML from a stream?
I have a well structured file, where each line is of the form
<sometag someattribute=''attr''>text</sometag>
for example
<SPAN CLASS=''myclass''>A bit of text</SPAN>, or
Just some text, without tags
What I would like to be able to do is parse each line so that I get an array
like this
SPAN
CLASS
myclass
A bit of text
or
Just some text, without tags
The array bit should follow, but I don''t profess to be a regex expert (or
any kind of expert for that matter). Can anyone help with a suitable
pattern?
TIA
Charles
is this usefult for you?
http://regexplib.com/REDetails.aspx?regexp_id=520
Galin Iliev
MCSD, MCAD.NET
"Charles Law" <bl***@nowhere.com> wrote in message
news:%2****************@TK2MSFTNGP15.phx.gbl...Does anyone have a regex pattern to parse HTML from a stream?
I have a well structured file, where each line is of the form
<sometag someattribute=''attr''>text</sometag>
for example
<SPAN CLASS=''myclass''>A bit of text</SPAN>, or
Just some text, without tags
What I would like to be able to do is parse each line so that I get an
array like this
SPAN
CLASS
myclass
A bit of text
or
Just some text, without tags
The array bit should follow, but I don''t profess to be a regex expert (or
any kind of expert for that matter). Can anyone help with a suitable
pattern?
TIA
Charles
"Charles Law" <bl***@nowhere.com> schrieb:Does anyone have a regex pattern to parse HTML from a stream?
I have a well structured file, where each line is of the form
<sometag someattribute=''attr''>text</sometag>
for example
<SPAN CLASS=''myclass''>A bit of text</SPAN>, or
Just some text, without tags
What I would like to be able to do is parse each line so that I get an
array like this
SPAN
CLASS
myclass
A bit of text
Maybe it''s easier to use the HTML Agility Pack:
..NET Html Agility Pack: How to use malformed HTML just like it was
well-formed XML...
<URL:http://blogs.msdn.com/smourier/archive/2003/06/04/8265.aspx>
Download:
<URL:http://www.codefluent.com/smourier/download/htmlagilitypack.zip>
--
M S Herfried K. Wagner
M V P <URL:http://dotnet.mvps.org/>
V B <URL:http://classicvb.org/petition/>
Hi Galin
Thanks for the link. It looks like it ought to work, but when I test it
against even a simple tag it returns no matches. I tried verifying the
expression with Expresso and it gives the following error.
Reference to undefined group number 5.
Even when I test it using the facility on the web site it fails. Any idea
how to correct it?
Charles
"Galin Iliev" <iliev@_NOSPAM_.Galcho.com> wrote in message
news:%2****************@TK2MSFTNGP10.phx.gbl...is this usefult for you?
http://regexplib.com/REDetails.aspx?regexp_id=520
Galin Iliev
MCSD, MCAD.NET
"Charles Law" <bl***@nowhere.com> wrote in message
news:%2****************@TK2MSFTNGP15.phx.gbl...Does anyone have a regex pattern to parse HTML from a stream?
I have a well structured file, where each line is of the form
<sometag someattribute=''attr''>text</sometag>
for example
<SPAN CLASS=''myclass''>A bit of text</SPAN>, or
Just some text, without tags
What I would like to be able to do is parse each line so that I get an
array like this
SPAN
CLASS
myclass
A bit of text
or
Just some text, without tags
The array bit should follow, but I don''t profess to be a regex expert (or
any kind of expert for that matter). Can anyone help with a suitable
pattern?
TIA
Charles
这篇关于正则表达式解析HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!