如何获得HTML元素使用C#坐标? [英] How to get HTML element coordinates using C#?

查看:214
本文介绍了如何获得HTML元素使用C#坐标?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我打算开发网络爬虫,这将提取HTML元素的坐标,从网页。我发现,它有可能获得html元素通过使用MSHTML总成坐标。眼下的我想知道这是否是可能的,如何从网页只有必要的信息(HTML,CSS),然后使用适当的MSHTML类获得的正确的协调所有的HTML元素?

I am planning to develop web crawler, which would extract coordinates of html elements from web pages. I have found out that it is possible to get html element coordinates by using "mshtml" assembly. Right now I would like to know if it is possible and how to get only necessary information (html,css) from web page, and then by using appropriate mshtml classes get correct coordinates of all html elements?

感谢您!

推荐答案

我使用这些C#函数来确定元素的位置。您需要在有问题的HTML元素的引用来传递。

I use these c# functions to determine element positions. You need to pass in a reference to the HTML element in question.

public static int findPosX( mshtml.IHTMLElement obj ) 
{
  int curleft = 0;
  if (obj.offsetParent != null ) 
  {
	while (obj.offsetParent != null ) 
	{
	  curleft += obj.offsetLeft;
	  obj = obj.offsetParent;
	}
  } 

  return curleft;
}

public static int findPosY( mshtml.IHTMLElement obj ) 
{
  int curtop = 0;
  if (obj.offsetParent != null ) 
  {
	while (obj.offsetParent != null ) 
	{
	  curtop += obj.offsetTop;
	  obj = obj.offsetParent;
	}
  } 

  return curtop;
}



我从当前文档,像这样的HTML元素:

I get HTML elements from the current document like so:

// start an instance of IE
public SHDocVw.InternetExplorerClass ie;
ie = new SHDocVw.InternetExplorerClass();
ie.Visible = true;

// Load a url
Object Flags = null, TargetFrameName = null, PostData = null, Headers = null;
ie.Navigate( url, ref Flags, ref TargetFrameName, ref PostData, ref Headers );

while( ie.Busy )
{
  Thread.Sleep( 500 );
}

// get an element from the loaded document
mshtml.HTMLDocumentClass document = ((mshtml.HTMLDocumentClass)ie.Document);
document.getElementById("myelementsid");

这篇关于如何获得HTML元素使用C#坐标?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆