Delphi:有些提示来解析这个html表? [英] Delphi: Some tip to parse this html table?

查看:259
本文介绍了Delphi:有些提示来解析这个html表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一段时间我试图从这个html表中获取数据,我尝试了组件支付和免费。我试图做一些编码,也没有结果。我有一个类直接为ClientDataSet抛出html表,但是这个表不起作用。任何人都有如何获取这个html表中的数据的提示?还是将其转换为txt / xls / csv或xml?按照表格的代码:

  WebBrowser1.Navigate('http://site2.aesa.pb.gov.br/ aesa / monitoramentoPluviometria.do?metodo = listarMesesChuvasMensais'); 
WebBrowser1.OleObject.Document.All.Tags('select')。Item(0).Value:='2013';
WebBrowser1.OleObject.Document.All.Tags('select')。Item(1).Value:='7';
WebBrowser1.OleObject.Document.All.Tags('input')。Item(1).click;
Memo1.Text:= WebBrowser1.OleObject.Document.All.Tags('table')。Item(10).InnerHTML;
Memo1.Lines.SaveToFile('table.html');


解决方案

以下内容将从HTML表格中提取数据您的目标页面
并将其加载到ClientDataSet中。



这是相当长的时间,也许演示如David所说,Delphi
可能不是这个工作的最佳工具。



在我的Form1上,我有一个TEdit,edValue,为我输入第一个
数据行中的值HTML表数据。我使用这个方法来查找
HTML文档中的表。我敢说有更好的方法,但是至少我的方法应该比硬编码假设更强大,就是嵌入表的文档的布局,这可能不会在页面作者的变化中生存。



广泛来说,代码的工作原理是首先使用
我的edValue.Text的内容查找HTML表单元格,然后找到单元格所属的表,然后
从表中填入CDS的字段和数据。



默认情况下,CDS字段设置为255个字符;也许有一个规定
在网页上发布的数据,允许您使用一些较小的值(如果不是全部)的字段。它们都被假定为ftString类型,以避免代码阻塞意外的单元格内容。



Btw,底部是一个用于保存HTML页面的效用函数本地,到
保存不得不继续点击按钮选择年+月。要从保存的文件中重新加载$ ​​b $ b的WebBrowser,只需使用文件的名称作为要加载的URL。

  TForm1 = class(TForm)
[...]
public
{公开声明}
文档:IHtmlDocument2;

程序TForm1.btnFindValueClick(Sender:TObject);
var
表:IHTMLTable;
begin
Doc:= WebBrowser1.Document as IHTMLDocument2;
表:= FindTableByCellValue(edValue.Text);
Assert(Table<> Nil);
LoadCDSFromHTMLTable(CDS,Table);
结束

procedure TForm1.LoadCDSFromHTMLTable(DestCDS:TClientDataSet; Table:IHTMLTable);
var
I,
J:整数;
vTable:OleVariant;
iRow:IHTMLTableRow;
FieldName,
FieldValue:String;
字段:TField;
const
cMaxFieldSize = 255;
scIDFieldName ='ID';
begin
//使用OleVariant代替IHTMLTable,因为它不太适合做以下操作
vTable:= Table;
Assert(不是DestCDS.Active和(DestCDS.FieldCount = 0));

//首先创建一个AutoInc字段
字段:= TAutoIncField.Create(Self);
Field.FieldName:= scIDFieldName;
Field.DataSet:= DestCDS;


//下一步从表的第一行单元格中的名称创建CDS字段
为I:= 0到(vTable.Rows.Item(0) .Cells.Length - 1)do begin
FieldName:= vTable.Rows.Item(0).Cells.Item(I).InnerText;
Field:= TStringField.Create(Self);
//此时,我们可能希望通过删除嵌入的空格来清理FieldName。
Field.FieldName:= FieldName;
Field.Size:= cMaxFieldSize;
Field.DataSet = = DestCDS;
结束

DestCDS.DisableControls;
try
DestCDS.IndexFieldNames:= scIDFieldName;
DestCDS.CreateDataSet;

//下一步将HTML表数据加载到CDS
中为I:= 1 to(vTable.Rows.Length - 1)do begin
DestCDS.Insert;
for J:= 0 to vTable.Rows.Item(0).Cells.Length - 1 do begin
FieldValue:= vTable.Rows.Item(I).Cells.Item(J).InnerText ;
// J + 1是因为Fields [0]是autoinc一个
DestCDS.Fields [J + 1] .AsString:= FieldValue;
结束
DestCDS.Post;
结束
DestCDS.First;
finally
DestCDS.EnableControls;
结束
结束

函数TForm1.FindTableCellByTagValue(Doc:IHtmlDocument2; const AValue:String):IHTMLTableCell;
var
全部:IHTMLElementCollection;
值:String;
I,
Len:整数;
E:OleVariant;
iE:IHTMLElement;
iT:IHTMLTextElement;
iC:IHTMLTableCell;
begin
结果:= Nil;
全部:= Doc.All;
如果All = Nil然后退出;
Len:= All.Length;

for I:= 0 to Len - 1 do begin
E:= All.Item(I,varEmpty);
iE:= IDispatch(E)as IHTMLElement;
如果支持(iE,IHTMLTableCell,iC)然后开始
值:= Trim(iE.Get_InnerText);
如果Pos(Trim(AValue),Value)= 1,则开始
结果:= iC;
休息;
end
end
else
继续;
结束
结束

函数TForm1.FindTableByCellValue(Value:String):IHTMLTable;
var
节点:IHtmlElement;
iTable:IHTMLTable;
iCell:IHTMLTableCell;
begin
结果:= Nil;
iCell:= FindTableCellByTagValue(Doc,edValue.Text);
如果iCell = Nil then
退出;
Node:= IDispatch(iCell)as IHtmlElement;

//如果我们找到一个节点,我们正在寻找单元格文本,
//我们现在可以找到它所属的HTML表

节点<> Nil do begin
Node:= Node.parentElement;
如果支持(节点,IHTMLTable,iTable)然后开始
结果:= iTable;
休息;
结束
结束
结束

程序TForm1.SaveFileLocally(const FileName:String);
var
PFile:IPersistFile; //在ActiveX单元中声明
begin
PFile:= Doc as IPersistFile;
PFile.Save(StringToOleStr(FileName),False);
结束


some time I'm trying to get data from this html table, I tried components paid and free. I tried to do some coding and also got no results. I have a class that throw directly html tables for ClientDataSet, but with this table it does not work. Anyone have any tips on how to get the data in this html table? Or a way to convert it to txt / xls / csv or xml? Follows the code for the table:

  WebBrowser1.Navigate('http://site2.aesa.pb.gov.br/aesa/monitoramentoPluviometria.do?metodo=listarMesesChuvasMensais');
  WebBrowser1.OleObject.Document.All.Tags('select').Item(0).Value:= '2013';
  WebBrowser1.OleObject.Document.All.Tags('select').Item(1).Value:= '7';
  WebBrowser1.OleObject.Document.All.Tags('input').Item(1).click;
  Memo1.Text:= WebBrowser1.OleObject.Document.All.Tags('table').Item(10).InnerHTML;
  Memo1.Lines.SaveToFile('table.html');

解决方案

The following will extract the data from the HTML table on your target page and load it into a ClientDataSet.

It's fairly long-winded, perhaps demonstrating that as David said, Delphi is maybe not the best tool for the job.

On my Form1, I have a TEdit, edValue, for me to key in the value in the first data row in the HTML table data. I use this as a way to find the table in the HTML document. I dare say there are better methods, but at least my method should be more robust than hard-coding assumptions about the layout of the document in which the table is embedded that maybe won't survive a change by the page's author.

Broadly, the code works by first finding the HTML table cell using the contents of my edValue.Text, then finding the table to which the cell belongs, and then populating the CDS's Fields and data from the table.

The CDS fields are set to 255 characters by default; maybe there's a specification for the data published on the web page that would allow you to use a smaller value for some, if not all, fields. They're all assumed to be of type ftString, to avoid the code choking on unexpected cell contents.

Btw, at the bottom is a utility function for saving the HTML page locally, to save having to keep clicking the button for selecting a year + month. To reload the WebBrowser from the saved file, just use the file's name as the URL to load.

TForm1 = class(TForm)
[ ... ]
public
  { Public declarations }
  Doc : IHtmlDocument2;

procedure TForm1.btnFindValueClick(Sender: TObject);
var
  Table : IHTMLTable;
begin
  Doc := WebBrowser1.Document as IHTMLDocument2;
  Table := FindTableByCellValue(edValue.Text);
  Assert(Table <> Nil);
  LoadCDSFromHTMLTable(CDS, Table);
end;

procedure TForm1.LoadCDSFromHTMLTable(DestCDS : TClientDataSet; Table : IHTMLTable);
var
  I,
  J : Integer;
  vTable : OleVariant;
  iRow : IHTMLTableRow;
  FieldName,
  FieldValue : String;
  Field : TField;
const
  cMaxFieldSize = 255;
  scIDFieldName = 'ID';
begin
  //  Use OleVariant instead of IHTMLTable becuse it's less fiddly for doing what follows
  vTable := Table;
  Assert(not DestCDS.Active and (DestCDS.FieldCount = 0));

  //  First create an AutoInc field
  Field := TAutoIncField.Create(Self);
  Field.FieldName := scIDFieldName;
  Field.DataSet := DestCDS;


  // Next create CDS fields from the names in the cells in the first row of the table
  for I := 0 to (vTable.Rows.Item(0).Cells.Length - 1) do begin
    FieldName := vTable.Rows.Item(0).Cells.Item(I).InnerText;
    Field := TStringField.Create(Self);
    // At this point, we might want to clean up the FieldName by removing embedded spaces, etc
    Field.FieldName := FieldName;
    Field.Size := cMaxFieldSize;
    Field.DataSet := DestCDS;
  end;

  DestCDS.DisableControls;
  try
    DestCDS.IndexFieldNames := scIDFieldName;
    DestCDS.CreateDataSet;

    //  Next load the HTML table data into the CDS
    for I := 1 to (vTable.Rows.Length - 1) do begin
      DestCDS.Insert;
      for J := 0 to vTable.Rows.Item(0).Cells.Length - 1 do begin
        FieldValue := vTable.Rows.Item(I).Cells.Item(J).InnerText;
        // the J + 1 is because Fields[0] is the autoinc one
        DestCDS.Fields[J + 1].AsString := FieldValue;
      end;
      DestCDS.Post;
    end;
    DestCDS.First;
  finally
    DestCDS.EnableControls;
  end;
end;

function TForm1.FindTableCellByTagValue(Doc : IHtmlDocument2; const AValue : String) : IHTMLTableCell;
var
  All: IHTMLElementCollection;
  Value: String;
  I,
  Len: Integer;
  E: OleVariant;
  iE : IHTMLElement;
  iT : IHTMLTextElement;
  iC : IHTMLTableCell;
begin
  Result := Nil;
  All := Doc.All;
  if All = Nil then Exit;
  Len := All.Length;

  for I := 0 to Len - 1 do begin
    E := All.Item(I, varEmpty);
    iE := IDispatch(E) as IHTMLElement;
    if Supports(iE, IHTMLTableCell, iC) then begin
      Value := Trim(iE.Get_InnerText);
      if Pos(Trim(AValue), Value) = 1 then begin
        Result := iC;
        Break;
      end
    end
    else
      Continue;
  end;
end;

function TForm1.FindTableByCellValue(Value : String): IHTMLTable;
var
  Node : IHtmlElement;
  iTable : IHTMLTable;
  iCell : IHTMLTableCell;
begin
  Result := Nil;
  iCell := FindTableCellByTagValue(Doc, edValue.Text);
  if iCell = Nil then
    Exit;
  Node := IDispatch(iCell) as IHtmlElement;

  //  if we found a Node with the cell text we were looking for,
  //  we can now find the HTML table to which it belongs

  while Node <> Nil do begin
    Node := Node.parentElement;
    if Supports(Node, IHTMLTable, iTable) then begin
      Result := iTable;
      Break;
    end;
  end;
end;

procedure TForm1.SaveFileLocally(const FileName : String);
var
  PFile: IPersistFile;  // declared in ActiveX unit
begin
  PFile := Doc as IPersistFile;
  PFile.Save(StringToOleStr(FileName), False);
end;

这篇关于Delphi:有些提示来解析这个html表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆