HTML 标签解析 [英] HTML Tag Parsing
问题描述
我如何解析名称:&使用 DIHtmlParser 从标记中取值 文本?我尝试使用 Clever Components 的 TCLHtmlParser 来完成它,但它失败了.第二个问题是 DIHtmlParser 能否解析单个标签,例如循环遍历其子标签.这么简单的问题简直就是噩梦.
<label class="tvLabel">名称:</label><span class="tvValue">Value</span><div class="clear"></div></div><div class="tvRow tvFirst hasLabel tvFirst" title="example2"><label class="tvLabel">名称:</label><span class="tvValue">Value</span><div class="clear"></div></div> 解决方案 您可以使用 IHTMLDocument2
DOM 从 HTML 中解析您需要的任何元素:
使用 ActiveX、MSHTML;常量HTML ='<div class="tvRow tvFirst hasLabel tvFirst" title="example1">'+'<label class="tvLabel">名称:</label>'+'<span class="tvValue">Value</span>'+'<div class="clear"></div>'+'</div>';过程 TForm1.Button1Click(Sender: TObject);无功文档:OleVariant;el:OleVariant;i:整数;开始doc := coHTMLDocument.Create as IHTMLDocument2;doc.write(HTML);doc.close;ShowMessage(doc.body.innerHTML);对于 i := 0 到 doc.body.all.length - 1 做开始el := doc.body.all.item(i);如果 (el.tagName = 'LABEL') 和 (el.className = 'tvLabel') 那么ShowMessage(el.innerText);如果 (el.tagName = 'SPAN') 和 (el.className = 'tvValue') 那么ShowMessage(el.innerText);结尾;结尾;
<小时>
我想提一下我今天发现的另一个非常好的 HTML 解析器:htmlp
(Delphi DomHTML 解析器和转换器).显然,它不如 IHTMLDocument2
灵活,但它非常易于使用,速度快,免费,并且支持旧 Delphi 版本的 Unicode.
示例用法:
使用HtmlParser、DomCore;函数 GetDocBody(HtmlDoc: TDocument): TElement;无功i:整数;节点:TNode;开始结果:=零;对于 i := 0 到 HtmlDoc.documentElement.childNodes.length - 1 做开始节点:= HtmlDoc.documentElement.childNodes.item(i);如果 node.nodeName = 'body' 那么开始结果 := 节点作为 TElement;休息;结尾;结尾;结尾;过程 THTMLForm.Button2Click(Sender: TObject);无功HtmlParser: THtmlParser;HtmlDoc:TDocument;i:整数;身体, el: TElement;节点:TNode;开始HtmlParser := THtmlParser.Create;尝试HtmlDoc := HtmlParser.parseString(HTML);尝试正文:= GetDocBody(HtmlDoc);如果已分配(主体)然后对于 i := 0 到 body.childNodes.length - 1 做开始节点:= body.childNodes.item(i);如果(节点是 TElement)那么开始el := 节点作为 TElement;if (el.tagName = 'div') and (el.GetAttribute('class') = 'tvRow tvFirst hasLabel tvFirst') 然后开始//在这里迭代 el.childNodes...ShowMessage(IntToStr(el.childNodes.length));结尾;结尾;结尾;最后HtmlDoc.Free;结尾;最后HtmlParser.Free结尾;结尾;
How can I parse Name: & Value text from within the tag with DIHtmlParser? I tried doing it with TCLHtmlParser from Clever Components but it failed. Second question is can DIHtmlParser parse individual tags for example loop through its sub tags. Its a total nightmare for such a simple problem.
<div class="tvRow tvFirst hasLabel tvFirst" title="example1">
<label class="tvLabel">Name:</label>
<span class="tvValue">Value</span>
<div class="clear"></div></div>
<div class="tvRow tvFirst hasLabel tvFirst" title="example2">
<label class="tvLabel">Name:</label>
<span class="tvValue">Value</span>
<div class="clear"></div></div>
解决方案 You could use IHTMLDocument2
DOM to parse whatever elements you need from the HTML:
uses ActiveX, MSHTML;
const
HTML =
'<div class="tvRow tvFirst hasLabel tvFirst" title="example1">' +
'<label class="tvLabel">Name:</label>' +
'<span class="tvValue">Value</span>' +
'<div class="clear"></div>' +
'</div>';
procedure TForm1.Button1Click(Sender: TObject);
var
doc: OleVariant;
el: OleVariant;
i: Integer;
begin
doc := coHTMLDocument.Create as IHTMLDocument2;
doc.write(HTML);
doc.close;
ShowMessage(doc.body.innerHTML);
for i := 0 to doc.body.all.length - 1 do
begin
el := doc.body.all.item(i);
if (el.tagName = 'LABEL') and (el.className = 'tvLabel') then
ShowMessage(el.innerText);
if (el.tagName = 'SPAN') and (el.className = 'tvValue') then
ShowMessage(el.innerText);
end;
end;
I wanted to mention another very nice HTML parser I found today: htmlp
(Delphi Dom HTML Parser and Converter). It's not as flexible as the IHTMLDocument2
obviously, but it's very easy to work with, fast, free, and supports Unicode for older Delphi versions.
Sample usage:
uses HtmlParser, DomCore;
function GetDocBody(HtmlDoc: TDocument): TElement;
var
i: integer;
node: TNode;
begin
Result := nil;
for i := 0 to HtmlDoc.documentElement.childNodes.length - 1 do
begin
node := HtmlDoc.documentElement.childNodes.item(i);
if node.nodeName = 'body' then
begin
Result := node as TElement;
Break;
end;
end;
end;
procedure THTMLForm.Button2Click(Sender: TObject);
var
HtmlParser: THtmlParser;
HtmlDoc: TDocument;
i: Integer;
body, el: TElement;
node: TNode;
begin
HtmlParser := THtmlParser.Create;
try
HtmlDoc := HtmlParser.parseString(HTML);
try
body := GetDocBody(HtmlDoc);
if Assigned(body) then
for i := 0 to body.childNodes.length - 1 do
begin
node := body.childNodes.item(i);
if (node is TElement) then
begin
el := node as TElement;
if (el.tagName = 'div') and (el.GetAttribute('class') = 'tvRow tvFirst hasLabel tvFirst') then
begin
// iterate el.childNodes here...
ShowMessage(IntToStr(el.childNodes.length));
end;
end;
end;
finally
HtmlDoc.Free;
end;
finally
HtmlParser.Free
end;
end;
这篇关于HTML 标签解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文