解析HTML得到脚本变量值 [英] Parsing HTML to get script variable value
问题描述
我试图找到访问由我提出的HTTP请求到服务器返回的标签之间的数据的方法。该文件有多个标签,但只有标签之一有它的JavaScript之间code外,其余均包括来自文件。我想访问脚本标签之间的code。
I'm trying to find a method of accessing data between tags returned by a server I am making HTTP requests to. The document has multiple tags, but only one of the tags has JavaScript code between it, the rest are included from files. I want to accesses the code between the script tag.
在code的一个例子是:
An example of the code is:
<html>
// Some HTML
<script>
var spect = [['temper', 'init', []],
['fw\/lib', 'init', [{staticRoot: '//site.com/js/'}]],
["cap","dm",[{"tackmod":"profile","xMod":"timed"}]]];
</script>
// More HTML
</html>
我要找来获取数据SPECT之间并解析它的理想方式。有时有SPECT和=,有时没有之间的空间。不知道为什么,但我有超过服务器的控制。
I'm looking for an ideal way to grab the data between 'spect' and parse it. Sometimes there is a space between 'spect' and the '=' and sometimes there isn't. No idea why, but I have no control over the server.
我知道这个问题可能已经问过,但答复建议使用类似HTMLAgilityPack,我宁愿避免使用库来完成这个任务,我只需要从DOM获得JavaScript的一次。
I know this question may have been asked, but the responses suggest using something like HTMLAgilityPack, and I'd rather avoid using a library for this task as I only need to get the JavaScript from the DOM once.
推荐答案
如何这可以使用的 HTMLAgilityPack 并 侏罗纪库 评估结果:
Very simple example of how this could be easy using a HTMLAgilityPack and Jurassic library to evaluate the result:
var html = @"<html>
// Some HTML
<script>
var spect = [['temper', 'init', []],
['fw\/lib', 'init', [{staticRoot: '//site.com/js/'}]],
[""cap"",""dm"",[{""tackmod"":""profile"",""xMod"":""timed""}]]];
</script>
// More HTML
</html>";
// Grab the content of the first script element
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var script = doc.DocumentNode.Descendants()
.Where(n => n.Name == "script")
.First().InnerText;
// Return the data of spect and stringify it into a proper JSON object
var engine = new Jurassic.ScriptEngine();
var result = engine.Evaluate("(function() { " + script + " return spect; })()");
var json = JSONObject.Stringify(engine, result);
Console.WriteLine(json);
Console.ReadKey();
输出:
[[\"temper\",\"init\",[]],[\"fw/lib\",\"init\",[{\"staticRoot\":\"//site.com/js/\"}]],[\"cap\",\"dm\",[{\"tackmod\":\"profile\",\"xMod\":\"timed\"}]]]
[["temper","init",[]],["fw/lib","init",[{"staticRoot":"//site.com/js/"}]],["cap","dm",[{"tackmod":"profile","xMod":"timed"}]]]
注意:我不占错误或别的什么,这仅仅是作为如何抓住脚本和评估SPECT的值的例子
Note: I am not accounting for errors or anything else, this merely serves as an example of how to grab the script and evaluate for the value of spect.
有一些其他的库执行/评估JavaScript的为好。
There are a few other libraries for executing/evaluating JavaScript as well.
这篇关于解析HTML得到脚本变量值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!