C# - 使用获取JavaScript变量值HTMLAgilityPack [英] C# - Get JavaScript variable value using HTMLAgilityPack

查看:999
本文介绍了C# - 使用获取JavaScript变量值HTMLAgilityPack的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前有中,我需要从检索值2的JavaScript变量。该HTML由没有ID /名称属性的一系列嵌套的DIV的。是否有可能恢复使用HTMLAgilityPack这些变量的数据?如果是这样我将如何去这样做,如果没有什么会被要求,正则表达式?如果是后者,请帮我在创造一个正则表达式,让我做到这一点。谢谢



 < DIV的风格=保证金:12px的0像素; align =left> 
<脚本类型=文/ JavaScript的>
变量1 =VAR1;
变量2 =VAR2;
< / SCRIPT>
< / DIV>


解决方案

我假设你正在试图刮掉此信息从一个网站?最有可能的一个你没有直接的控制权?有几种方法可以做到这一点,我会去易到硬盘(至少我看到的EM):




  1. 问问所有者(网站)。大多数时候,他们可以给你的信息直接访问,如果你问很好,他们可能只是让你拥有它是免费的。


  2. 您可以使用 web浏览器控制,运行JavaScript,然后解析从DOM事后值。相对于HttpWebRequest的,这允许在网页被装载和刮所有适当的值。 有用的链接在这里。


  3. 断球萤火虫的来源。检查与萤火虫的网站看看哪些网址从后台调用。最有可能的,它使用异步请求,检索从Web服务更新的信息。 > XHR - 用Firebug,你可以在网下查看。看看请求和返回的值,然后你可以检索值你的自我,并从源头解析内容,而不是刮页面。




我认为这可能是你正在寻找的信息,但如果没有让我知道,我可以澄清/修复答案


I currently have 2 JavaScript variables in which I need to retrieve values from. The HTML consists of a series of nested DIVs with no id/name attributes. Is it possible to retrieve the data from these variables using HTMLAgilityPack? If so how would I go about doing so, if not what would be required, regular expressions? If the latter, please help me in creating a regular expression that would allow me to do this. Thank you.

<div style="margin: 12px 0px;" align="left">
<script type="text/javascript">
variable1 = "var1";
variable2 = "var2";
</script>
</div>

解决方案

I'm assuming you are trying to scrape this information from a website? Most likely one you don't have direct control over? There are several ways to do this, I'll go easy to hard( at least as I see em):

  1. Ask the owner (of the site). Most of the time they can give you direct access to the information and if you ask nicely, they might just let you have it for free

  2. You can use the webBrowser control, run the javascript and then parse values from the DOM afterwards. As opposed to HttpWebRequest, this allows for all the proper values to be loaded on the page and scraped. Helpful Link Here.

  3. Steal the source with Firebug. Inspect the website with Firebug to see which URLs are called from the background. Most likely, its using an asynchronous request to retrieving the updated information from a webservice. Using Firebug, you can view this under the NET -> XHR. Look at the request and the values returned, you can then retrieve the values your self and parse the contents from the source rather than scrape the page.

I think this might be the information you were looking for, but if not let me know and I can clarify/fix answer

这篇关于C# - 使用获取JavaScript变量值HTMLAgilityPack的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆