Delphi:有些提示来解析这个html表? [英] Delphi: Some tip to parse this html table?
问题描述
WebBrowser1.Navigate('http://site2.aesa.pb.gov.br/ aesa / monitoramentoPluviometria.do?metodo = listarMesesChuvasMensais');
WebBrowser1.OleObject.Document.All.Tags('select')。Item(0).Value:='2013';
WebBrowser1.OleObject.Document.All.Tags('select')。Item(1).Value:='7';
WebBrowser1.OleObject.Document.All.Tags('input')。Item(1).click;
Memo1.Text:= WebBrowser1.OleObject.Document.All.Tags('table')。Item(10).InnerHTML;
Memo1.Lines.SaveToFile('table.html');
以下内容将从HTML表格中提取数据您的目标页面
并将其加载到ClientDataSet中。
这是相当长的时间,也许演示如David所说,Delphi
可能不是这个工作的最佳工具。
在我的Form1上,我有一个TEdit,edValue,为我输入第一个
数据行中的值HTML表数据。我使用这个方法来查找
HTML文档中的表。我敢说有更好的方法,但是至少我的方法应该比硬编码假设更强大,就是嵌入表的文档的布局,这可能不会在页面作者的变化中生存。
广泛来说,代码的工作原理是首先使用
我的edValue.Text的内容查找HTML表单元格,然后找到单元格所属的表,然后
从表中填入CDS的字段和数据。
默认情况下,CDS字段设置为255个字符;也许有一个规定
在网页上发布的数据,允许您使用一些较小的值(如果不是全部)的字段。它们都被假定为ftString类型,以避免代码阻塞意外的单元格内容。
Btw,底部是一个用于保存HTML页面的效用函数本地,到
保存不得不继续点击按钮选择年+月。要从保存的文件中重新加载$ b $ b的WebBrowser,只需使用文件的名称作为要加载的URL。
TForm1 = class(TForm)
[...]
public
{公开声明}
文档:IHtmlDocument2;
程序TForm1.btnFindValueClick(Sender:TObject);
var
表:IHTMLTable;
begin
Doc:= WebBrowser1.Document as IHTMLDocument2;
表:= FindTableByCellValue(edValue.Text);
Assert(Table<> Nil);
LoadCDSFromHTMLTable(CDS,Table);
结束
procedure TForm1.LoadCDSFromHTMLTable(DestCDS:TClientDataSet; Table:IHTMLTable);
var
I,
J:整数;
vTable:OleVariant;
iRow:IHTMLTableRow;
FieldName,
FieldValue:String;
字段:TField;
const
cMaxFieldSize = 255;
scIDFieldName ='ID';
begin
//使用OleVariant代替IHTMLTable,因为它不太适合做以下操作
vTable:= Table;
Assert(不是DestCDS.Active和(DestCDS.FieldCount = 0));
//首先创建一个AutoInc字段
字段:= TAutoIncField.Create(Self);
Field.FieldName:= scIDFieldName;
Field.DataSet:= DestCDS;
//下一步从表的第一行单元格中的名称创建CDS字段
为I:= 0到(vTable.Rows.Item(0) .Cells.Length - 1)do begin
FieldName:= vTable.Rows.Item(0).Cells.Item(I).InnerText;
Field:= TStringField.Create(Self);
//此时,我们可能希望通过删除嵌入的空格来清理FieldName。
Field.FieldName:= FieldName;
Field.Size:= cMaxFieldSize;
Field.DataSet = = DestCDS;
结束
DestCDS.DisableControls;
try
DestCDS.IndexFieldNames:= scIDFieldName;
DestCDS.CreateDataSet;
//下一步将HTML表数据加载到CDS
中为I:= 1 to(vTable.Rows.Length - 1)do begin
DestCDS.Insert;
for J:= 0 to vTable.Rows.Item(0).Cells.Length - 1 do begin
FieldValue:= vTable.Rows.Item(I).Cells.Item(J).InnerText ;
// J + 1是因为Fields [0]是autoinc一个
DestCDS.Fields [J + 1] .AsString:= FieldValue;
结束
DestCDS.Post;
结束
DestCDS.First;
finally
DestCDS.EnableControls;
结束
结束
函数TForm1.FindTableCellByTagValue(Doc:IHtmlDocument2; const AValue:String):IHTMLTableCell;
var
全部:IHTMLElementCollection;
值:String;
I,
Len:整数;
E:OleVariant;
iE:IHTMLElement;
iT:IHTMLTextElement;
iC:IHTMLTableCell;
begin
结果:= Nil;
全部:= Doc.All;
如果All = Nil然后退出;
Len:= All.Length;
for I:= 0 to Len - 1 do begin
E:= All.Item(I,varEmpty);
iE:= IDispatch(E)as IHTMLElement;
如果支持(iE,IHTMLTableCell,iC)然后开始
值:= Trim(iE.Get_InnerText);
如果Pos(Trim(AValue),Value)= 1,则开始
结果:= iC;
休息;
end
end
else
继续;
结束
结束
函数TForm1.FindTableByCellValue(Value:String):IHTMLTable;
var
节点:IHtmlElement;
iTable:IHTMLTable;
iCell:IHTMLTableCell;
begin
结果:= Nil;
iCell:= FindTableCellByTagValue(Doc,edValue.Text);
如果iCell = Nil then
退出;
Node:= IDispatch(iCell)as IHtmlElement;
//如果我们找到一个节点,我们正在寻找单元格文本,
//我们现在可以找到它所属的HTML表
节点<> Nil do begin
Node:= Node.parentElement;
如果支持(节点,IHTMLTable,iTable)然后开始
结果:= iTable;
休息;
结束
结束
结束
程序TForm1.SaveFileLocally(const FileName:String);
var
PFile:IPersistFile; //在ActiveX单元中声明
begin
PFile:= Doc as IPersistFile;
PFile.Save(StringToOleStr(FileName),False);
结束
some time I'm trying to get data from this html table, I tried components paid and free. I tried to do some coding and also got no results. I have a class that throw directly html tables for ClientDataSet, but with this table it does not work. Anyone have any tips on how to get the data in this html table? Or a way to convert it to txt / xls / csv or xml? Follows the code for the table:
WebBrowser1.Navigate('http://site2.aesa.pb.gov.br/aesa/monitoramentoPluviometria.do?metodo=listarMesesChuvasMensais');
WebBrowser1.OleObject.Document.All.Tags('select').Item(0).Value:= '2013';
WebBrowser1.OleObject.Document.All.Tags('select').Item(1).Value:= '7';
WebBrowser1.OleObject.Document.All.Tags('input').Item(1).click;
Memo1.Text:= WebBrowser1.OleObject.Document.All.Tags('table').Item(10).InnerHTML;
Memo1.Lines.SaveToFile('table.html');
The following will extract the data from the HTML table on your target page and load it into a ClientDataSet.
It's fairly long-winded, perhaps demonstrating that as David said, Delphi is maybe not the best tool for the job.
On my Form1, I have a TEdit, edValue, for me to key in the value in the first data row in the HTML table data. I use this as a way to find the table in the HTML document. I dare say there are better methods, but at least my method should be more robust than hard-coding assumptions about the layout of the document in which the table is embedded that maybe won't survive a change by the page's author.
Broadly, the code works by first finding the HTML table cell using the contents of my edValue.Text, then finding the table to which the cell belongs, and then populating the CDS's Fields and data from the table.
The CDS fields are set to 255 characters by default; maybe there's a specification for the data published on the web page that would allow you to use a smaller value for some, if not all, fields. They're all assumed to be of type ftString, to avoid the code choking on unexpected cell contents.
Btw, at the bottom is a utility function for saving the HTML page locally, to save having to keep clicking the button for selecting a year + month. To reload the WebBrowser from the saved file, just use the file's name as the URL to load.
TForm1 = class(TForm)
[ ... ]
public
{ Public declarations }
Doc : IHtmlDocument2;
procedure TForm1.btnFindValueClick(Sender: TObject);
var
Table : IHTMLTable;
begin
Doc := WebBrowser1.Document as IHTMLDocument2;
Table := FindTableByCellValue(edValue.Text);
Assert(Table <> Nil);
LoadCDSFromHTMLTable(CDS, Table);
end;
procedure TForm1.LoadCDSFromHTMLTable(DestCDS : TClientDataSet; Table : IHTMLTable);
var
I,
J : Integer;
vTable : OleVariant;
iRow : IHTMLTableRow;
FieldName,
FieldValue : String;
Field : TField;
const
cMaxFieldSize = 255;
scIDFieldName = 'ID';
begin
// Use OleVariant instead of IHTMLTable becuse it's less fiddly for doing what follows
vTable := Table;
Assert(not DestCDS.Active and (DestCDS.FieldCount = 0));
// First create an AutoInc field
Field := TAutoIncField.Create(Self);
Field.FieldName := scIDFieldName;
Field.DataSet := DestCDS;
// Next create CDS fields from the names in the cells in the first row of the table
for I := 0 to (vTable.Rows.Item(0).Cells.Length - 1) do begin
FieldName := vTable.Rows.Item(0).Cells.Item(I).InnerText;
Field := TStringField.Create(Self);
// At this point, we might want to clean up the FieldName by removing embedded spaces, etc
Field.FieldName := FieldName;
Field.Size := cMaxFieldSize;
Field.DataSet := DestCDS;
end;
DestCDS.DisableControls;
try
DestCDS.IndexFieldNames := scIDFieldName;
DestCDS.CreateDataSet;
// Next load the HTML table data into the CDS
for I := 1 to (vTable.Rows.Length - 1) do begin
DestCDS.Insert;
for J := 0 to vTable.Rows.Item(0).Cells.Length - 1 do begin
FieldValue := vTable.Rows.Item(I).Cells.Item(J).InnerText;
// the J + 1 is because Fields[0] is the autoinc one
DestCDS.Fields[J + 1].AsString := FieldValue;
end;
DestCDS.Post;
end;
DestCDS.First;
finally
DestCDS.EnableControls;
end;
end;
function TForm1.FindTableCellByTagValue(Doc : IHtmlDocument2; const AValue : String) : IHTMLTableCell;
var
All: IHTMLElementCollection;
Value: String;
I,
Len: Integer;
E: OleVariant;
iE : IHTMLElement;
iT : IHTMLTextElement;
iC : IHTMLTableCell;
begin
Result := Nil;
All := Doc.All;
if All = Nil then Exit;
Len := All.Length;
for I := 0 to Len - 1 do begin
E := All.Item(I, varEmpty);
iE := IDispatch(E) as IHTMLElement;
if Supports(iE, IHTMLTableCell, iC) then begin
Value := Trim(iE.Get_InnerText);
if Pos(Trim(AValue), Value) = 1 then begin
Result := iC;
Break;
end
end
else
Continue;
end;
end;
function TForm1.FindTableByCellValue(Value : String): IHTMLTable;
var
Node : IHtmlElement;
iTable : IHTMLTable;
iCell : IHTMLTableCell;
begin
Result := Nil;
iCell := FindTableCellByTagValue(Doc, edValue.Text);
if iCell = Nil then
Exit;
Node := IDispatch(iCell) as IHtmlElement;
// if we found a Node with the cell text we were looking for,
// we can now find the HTML table to which it belongs
while Node <> Nil do begin
Node := Node.parentElement;
if Supports(Node, IHTMLTable, iTable) then begin
Result := iTable;
Break;
end;
end;
end;
procedure TForm1.SaveFileLocally(const FileName : String);
var
PFile: IPersistFile; // declared in ActiveX unit
begin
PFile := Doc as IPersistFile;
PFile.Save(StringToOleStr(FileName), False);
end;
这篇关于Delphi:有些提示来解析这个html表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!