HTML 标签解析 [英] HTML Tag Parsing

查看:25
本文介绍了HTML 标签解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我如何解析名称:&使用 DIHtmlParser 从标记中取值 文本?我尝试使用 Clever Components 的 TCLHtmlParser 来完成它,但它失败了.第二个问题是 DIHtmlParser 能否解析单个标签,例如循环遍历其子标签.这么简单的问题简直就是噩梦.

<label class="tvLabel">名称:</label><span class="tvValue">Value</span><div class="clear"></div></div><div class="tvRow tvFirst hasLabel tvFirst" title="example2"><label class="tvLabel">名称:</label><span class="tvValue">Value</span><div class="clear"></div></div>

解决方案

您可以使用 IHTMLDocument2 DOM 从 HTML 中解析您需要的任何元素:

 使用 ActiveX、MSHTML;常量HTML ='<div class="tvRow tvFirst hasLabel tvFirst" title="example1">'+'<label class="tvLabel">名称:</label>'+'<span class="tvValue">Value</span>'+'<div class="clear"></div>'+'</div>';过程 TForm1.Button1Click(Sender: TObject);无功文档:OleVariant;el:OleVariant;i:整数;开始doc := coHTMLDocument.Create as IHTMLDocument2;doc.write(HTML);doc.close;ShowMessage(doc.body.innerHTML);对于 i := 0 到 doc.body.all.length - 1 做开始el := doc.body.all.item(i);如果 (el.tagName = 'LABEL') 和 (el.className = 'tvLabel') 那么ShowMessage(el.innerText);如果 (el.tagName = 'SPAN') 和 (el.className = 'tvValue') 那么ShowMessage(el.innerText);结尾;结尾;

<小时>

我想提一下我今天发现的另一个非常好的 HTML 解析器:htmlp(Delphi DomHTML 解析器和转换器).显然,它不如 IHTMLDocument2 灵活,但它非常易于使用,速度快,免费,并且支持旧 Delphi 版本的 Unicode.

示例用法:

使用HtmlParser、DomCore;函数 GetDocBody(HtmlDoc: TDocument): TElement;无功i:整数;节点:TNode;开始结果:=零;对于 i := 0 到 HtmlDoc.documentElement.childNodes.length - 1 做开始节点:= HtmlDoc.documentElement.childNodes.item(i);如果 node.nodeName = 'body' 那么开始结果 := 节点作为 TElement;休息;结尾;结尾;结尾;过程 THTMLForm.Button2Click(Sender: TObject);无功HtmlParser: THtmlParser;HtmlDoc:TDocument;i:整数;身体, el: TElement;节点:TNode;开始HtmlParser := THtmlParser.Create;尝试HtmlDoc := HtmlParser.parseString(HTML);尝试正文:= GetDocBody(HtmlDoc);如果已分配(主体)然后对于 i := 0 到 body.childNodes.length - 1 做开始节点:= body.childNodes.item(i);如果(节点是 TElement)那么开始el := 节点作为 TElement;if (el.tagName = 'div') and (el.GetAttribute('class') = 'tvRow tvFirst hasLabel tvFirst') 然后开始//在这里迭代 el.childNodes...ShowMessage(IntToStr(el.childNodes.length));结尾;结尾;结尾;最后HtmlDoc.Free;结尾;最后HtmlParser.Free结尾;结尾;

How can I parse Name: & Value text from within the tag with DIHtmlParser? I tried doing it with TCLHtmlParser from Clever Components but it failed. Second question is can DIHtmlParser parse individual tags for example loop through its sub tags. Its a total nightmare for such a simple problem.

<div class="tvRow tvFirst hasLabel tvFirst" title="example1">
  <label class="tvLabel">Name:</label>
  <span class="tvValue">Value</span>
<div class="clear"></div></div>

<div class="tvRow tvFirst hasLabel tvFirst" title="example2">
  <label class="tvLabel">Name:</label>
  <span class="tvValue">Value</span>
<div class="clear"></div></div>

解决方案

You could use IHTMLDocument2 DOM to parse whatever elements you need from the HTML:

uses ActiveX, MSHTML;

const
  HTML =
  '<div class="tvRow tvFirst hasLabel tvFirst" title="example1">' +
  '<label class="tvLabel">Name:</label>' +
  '<span class="tvValue">Value</span>' +
  '<div class="clear"></div>' +
  '</div>';

procedure TForm1.Button1Click(Sender: TObject);
var
  doc: OleVariant;
  el: OleVariant;
  i: Integer;
begin
  doc := coHTMLDocument.Create as IHTMLDocument2;
  doc.write(HTML);
  doc.close;
  ShowMessage(doc.body.innerHTML);
  for i := 0 to doc.body.all.length - 1 do
  begin
    el := doc.body.all.item(i);
    if (el.tagName = 'LABEL') and (el.className = 'tvLabel') then
      ShowMessage(el.innerText);
    if (el.tagName = 'SPAN') and (el.className = 'tvValue') then
      ShowMessage(el.innerText);
  end;
end;


I wanted to mention another very nice HTML parser I found today: htmlp (Delphi Dom HTML Parser and Converter). It's not as flexible as the IHTMLDocument2 obviously, but it's very easy to work with, fast, free, and supports Unicode for older Delphi versions.

Sample usage:

uses HtmlParser, DomCore;

function GetDocBody(HtmlDoc: TDocument): TElement;
var
  i: integer;
  node: TNode;
begin
  Result := nil;
  for i := 0 to HtmlDoc.documentElement.childNodes.length - 1 do
  begin
    node := HtmlDoc.documentElement.childNodes.item(i);
    if node.nodeName = 'body' then
    begin
      Result := node as TElement;
      Break;
    end;
  end;
end;

procedure THTMLForm.Button2Click(Sender: TObject);
var
  HtmlParser: THtmlParser;
  HtmlDoc: TDocument;
  i: Integer;
  body, el: TElement;
  node: TNode;
begin
  HtmlParser := THtmlParser.Create;
  try
    HtmlDoc := HtmlParser.parseString(HTML);
    try
      body := GetDocBody(HtmlDoc);
      if Assigned(body) then
        for i := 0 to body.childNodes.length - 1 do
        begin
          node := body.childNodes.item(i);
          if (node is TElement) then
          begin
            el := node as TElement;
            if (el.tagName = 'div') and (el.GetAttribute('class') = 'tvRow tvFirst hasLabel tvFirst') then
            begin
              // iterate el.childNodes here...
              ShowMessage(IntToStr(el.childNodes.length));
            end;
          end;
        end;
    finally
      HtmlDoc.Free;
    end;
  finally
    HtmlParser.Free
  end;
end;

这篇关于HTML 标签解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆