用ASP经典的html节点解析 [英] html node parsing with ASP classic
问题描述
我花了一天的时间试图找到答案:是否有可能使用经典的ASP,使用MSXML2.ServerXMLHTTP.6.0 - 解析HTML代码并通过gived ID提取HTML节点的内容?例如:
远程html文件:
< HTML>
.....
< div id =description>
这里有一些重要的注释
< / div>
.....
< / html>
asp代码
<%
...
Set objHTTP = CreateObject(MSXML2.ServerXMLHTTP.6.0)
objHTTP.OpenGET, url_of_remote_html,False
objHTTP.Send
...
%>
现在 - 我阅读了很多文档,有可能以源代码访问HTML(objHTTP .responseText)和结构体(objHTTP.responseXML)。但是,如何在一个世界中使用该XML响应来访问该div的内容?我阅读并尝试了很多例子,但找不到任何可以解决的事情。
解决方案首先,执行GET请求与原始代码片段一样:
设置http = CreateObject(MSXML2.ServerXMLHTTP.6.0)
http.OpenGET,url_of_remote_html,False
http.Send
接下来,创建一个正则表达式对象并设置模式以匹配元素的内部html与期望的ID:
设置regEx = New RegExp
regEx.Pattern =< div id =description>(。*?)< / div>
regEx.Global = True
最后,从第一个子匹配中提取内容第一次匹配:
On Error Resume Next
contents = regEx.Execute(http.responseText)(0).Submatches (0)
On Error转到0
如果出现任何错误,例如匹配元素在文档中找不到, contents 将会是
Null
。如果所有内容都计划到内容
应该保存您要查找的数据。
I stucked a day's trying to find a answer: is there a possibility with classic ASP, using MSXML2.ServerXMLHTTP.6.0 - to parse html code and extract a content of a HTML node by gived ID? For example:
remote html file:
<html>
.....
<div id="description">
some important notes here
</div>
.....
</html>
asp code
<%
...
Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP.6.0")
objHTTP.Open "GET", url_of_remote_html, False
objHTTP.Send
...
%>
Now - i read a lot of docs, that there is a possibility to access HTML as source (objHTTP.responseText) and as structure (objHTTP.responseXML). But how in a world i can use that XML response to access content of that div? I read and try so many examples, but can not find anything clear that I can solve that.
First up, perform the GET request as in your original code snippet:
Set http = CreateObject("MSXML2.ServerXMLHTTP.6.0")
http.Open "GET", url_of_remote_html, False
http.Send
Next, create a regular expression object and set the pattern to match the inner html of an element with the desired id:
Set regEx = New RegExp
regEx.Pattern = "<div id=""description"">(.*?)</div>"
regEx.Global = True
Lastly, pull out the content from the first submatch within the first match:
On Error Resume Next
contents = regEx.Execute(http.responseText)(0).Submatches(0)
On Error Goto 0
If anything goes wrong and for example the matching element isn't found in the document, contents
will be Null
. If all went to plan contents
should hold the data you're looking for.
这篇关于用ASP经典的html节点解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!