用ASP经典的html节点解析 [英] html node parsing with ASP classic

查看:216
本文介绍了用ASP经典的html节点解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我花了一天的时间试图找到答案:是否有可能使用经典的ASP,使用MSXML2.ServerXMLHTTP.6.0 - 解析HTML代码并通过gived ID提取HTML节点的内容?例如:

远程html文件

 < HTML> 
.....
< div id =description>
这里有一些重要的注释
< / div>
.....
< / html>

asp代码

 <%
...
Set objHTTP = CreateObject(MSXML2.ServerXMLHTTP.6.0)
objHTTP.OpenGET, url_of_remote_html,False
objHTTP.Send
...
%>

现在 - 我阅读了很多文档,有可能以源代码访问HTML(objHTTP .responseText)和结构体(objHTTP.responseXML)。但是,如何在一个世界中使用该XML响应来访问该div的内容?我阅读并尝试了很多例子,但找不到任何可以解决的事情。

解决方案

首先,执行GET请求与原始代码片段一样:

 设置http = CreateObject(MSXML2.ServerXMLHTTP.6.0)
http.OpenGET,url_of_remote_html,False
http.Send

接下来,创建一个正则表达式对象并设置模式以匹配元素的内部html与期望的ID:

 设置regEx = New RegExp 
regEx.Pattern =< div id =description>(。*?)< / div>
regEx.Global = True

最后,从第一个子匹配中提取内容第一次匹配:

  On Error Resume Next 
contents = regEx.Execute(http.responseText)(0).Submatches (0)
On Error转到0

如果出现任何错误,例如匹配元素在文档中找不到, contents 将会是 Null 。如果所有内容都计划到内容应该保存您要查找的数据。


I stucked a day's trying to find a answer: is there a possibility with classic ASP, using MSXML2.ServerXMLHTTP.6.0 - to parse html code and extract a content of a HTML node by gived ID? For example:

remote html file:

<html>
.....
<div id="description">
some important notes here
</div>
.....
</html>

asp code

<%    
    ...
    Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP.6.0")
    objHTTP.Open "GET", url_of_remote_html, False
    objHTTP.Send
    ...
%>

Now - i read a lot of docs, that there is a possibility to access HTML as source (objHTTP.responseText) and as structure (objHTTP.responseXML). But how in a world i can use that XML response to access content of that div? I read and try so many examples, but can not find anything clear that I can solve that.

解决方案

First up, perform the GET request as in your original code snippet:

Set http = CreateObject("MSXML2.ServerXMLHTTP.6.0")
http.Open "GET", url_of_remote_html, False
http.Send

Next, create a regular expression object and set the pattern to match the inner html of an element with the desired id:

Set regEx = New RegExp
regEx.Pattern = "<div id=""description"">(.*?)</div>"
regEx.Global = True

Lastly, pull out the content from the first submatch within the first match:

On Error Resume Next
contents = regEx.Execute(http.responseText)(0).Submatches(0)
On Error Goto 0

If anything goes wrong and for example the matching element isn't found in the document, contents will be Null. If all went to plan contents should hold the data you're looking for.

这篇关于用ASP经典的html节点解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆