使用冷灌注进行屏幕刮擦 [英] screen scraping using coldfusion

查看:197
本文介绍了使用冷灌注进行屏幕刮擦的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 < cfhttp url =https: //intra.att.com/itscmetrics/EM2/LTMR.cfmmethod =getusername =uvwxyzpassword =abcdef> 

< cfhttpparam type =urlname =LTMXvalue =Andre Fuetsch / Shelly K Lazzaro>

< / cfhttp>

< cfset myDocument = cfhttp.fileContent>

< cfoutput>
#myDocument#
< / cfoutput>

现在当我运行cfm页面时,iam可以访问desitination页面,
目标页面如下所示。





这部分源代码的一部分如下。

 < table border =1width = 99%style =border-collapse:collapse;> 
< thead>
< td colspan =12class =drpmainheader1_2> LTM Detail Report for Andre Fuetsch / Shelly K Lazzaro< / td>
< tr align =center>
< th class =ptitles>联络人姓名< / th>
< th class =ptitles>应用程序首字母缩写< / th>
< th class =ptitles> MOTS ID< / th>
< th class =ptitles>优先级< / th>
< th class =ptitles> MC< / th>
< th class =ptitles> DR练习< / th>
< th class =ptitles> ARM / SRM维护< / th>
< th class =ptitles> ARM / SRM创建< / th>
< th class =ptitles>备份&恢复认证< / th>
< th class =ptitles>接口认证< / th>
< th class =ptitles> AIA合规< / th>
< / tr>
< / thead>

< tbody>
< tr>
< td class =drpdetailtablerowdetailleft> Lynette M Acosta< / td>
< td class =drpdetailtablerowdetailleft> AABA< / td>
< td class =drpdetailtablerowdetail>< a href =http://ebiz.sbc.com/mots/detail.cfm?appl_id=9710target =_ blankstyle =color: blue;> 9710< / a>< / td>
< td class =drpdetailtablerowdetail> 5< / td>
< td class =drpdetailtablerowdetail> NMC< / td>
< td class =drpdetailtablerowdetail>合规< / td> < td class =drpdetailtablerowdetail>合规< / td> < td class =drpdetailtablerowdetail>合规< / td> < td class =drpdetailtablerowdetail>合规< / td> < td class =drpdetailtablerowdetail>合规< / td> < td class =drpdetailtablerowdetail>合规< / td>
< / tr>
< / tbody>

< tbody>
< tr>
< td class =drpdetailtablerowdetailleft> Lynette M Acosta< / td>
< td class =drpdetailtablerowdetailleft> ABS RECON +< / td>
< td class =drpdetailtablerowdetail>< a href =http://ebiz.sbc.com/mots/detail.cfm?appl_id=13999target =_ blankstyle =color: blue;> 13999< / a>< / td>
< td class =drpdetailtablerowdetail> 3< / td>
< td class =drpdetailtablerowdetail> NMC< / td>
< td class =drpdetailtablerowdetail>合规< / td> < td class =drpdetailtablerowdetail>合规< / td> < td class =drpdetailtablerowdetail>合规< / td> < td class =drpdetailtablerowdetail>合规< / td> < td class =drpdetailtablerowdetail>合规< / td> < td class =drpdetailtablerowdetail>合规< / td>
< / tr>
< / tbody>

我不喜欢regex在coldfusion,任何人都可以指导我或给我任何起点如何从html表提取数据使用Coldfusion?我没有访问DB。希望这是清楚的。

解决方案

使用正则表达式解析HTML?如果您使用jsoup HTML解析器w / ColdFusion,您将有更多的选择。 Jsoup使用类似jQuery的DOM选择器,可以快速将HTML表格数据转换为数组。



http://jsoup.org/



以下是一些相关文章&示例代码:




I am trying to screen scrape another application using the below code in Coldfusion.

 <cfhttp url="https://intra.att.com/itscmetrics/EM2/LTMR.cfm" method="get" username="uvwxyz" password="abcdef">  

 <cfhttpparam type="url" name="LTMX" value="Andre Fuetsch / Shelly K Lazzaro">

</cfhttp> 

  <cfset myDocument = cfhttp.fileContent>

<cfoutput>
  #myDocument#
</cfoutput>

Now when I run my cfm page, iam able to access the desitination page, with the above code. The destination page looks like below.

A part of the source code of this is as below.

<table border="1" width=99% style="border-collapse:collapse;">
    <thead>
    <td colspan="12" class="drpmainheader1_2">LTM Detail Report for Andre Fuetsch / Shelly K Lazzaro</td>
    <tr align="center">
      <th class="ptitles">Liaison Name</th>
      <th class="ptitles">Application Acronym</th>
      <th class="ptitles">MOTS ID</th>
      <th class="ptitles">Priority</th> 
      <th class="ptitles">MC</th>
      <th class="ptitles">DR Exercise</th>
      <th class="ptitles">ARM/SRM Maintenance</th>
      <th class="ptitles">ARM/SRM Creation</th>             
      <th class="ptitles">Backup & Recovery Certification</th>
      <th class="ptitles">Interface Certification</th>
      <th class="ptitles">AIA Compliance</th>   
    </tr>
    </thead>

    <tbody>
    <tr>
    <td class="drpdetailtablerowdetailleft">Lynette M Acosta</td>
    <td class="drpdetailtablerowdetailleft">AABA</td>
    <td class="drpdetailtablerowdetail"><a href="http://ebiz.sbc.com/mots/detail.cfm?appl_id=9710" target="_blank" style="color:blue;">9710</a></td>
    <td class="drpdetailtablerowdetail">5</td>
    <td class="drpdetailtablerowdetail">NMC</td>
<td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td>
    </tr>
    </tbody>

    <tbody>
    <tr>
    <td class="drpdetailtablerowdetailleft">Lynette M Acosta</td>
    <td class="drpdetailtablerowdetailleft">ABS RECON+</td>
    <td class="drpdetailtablerowdetail"><a href="http://ebiz.sbc.com/mots/detail.cfm?appl_id=13999" target="_blank" style="color:blue;">13999</a></td>
    <td class="drpdetailtablerowdetail">3</td>
    <td class="drpdetailtablerowdetail">NMC</td>
<td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td> <td class="drpdetailtablerowdetail">Compliant</td>
    </tr>
    </tbody>

I am not good with regex in coldfusion, Can anyone please guide me or give me any starting points as to how to extract the data from the html table using Coldfusion? I do not have access to the DB. Hope this is clear.

解决方案

Parsing HTML using regex? You'll have more options if you use the jsoup HTML Parser w/ColdFusion. Jsoup uses jQuery-like DOM selectors and can quickly convert the HTML table data into arrays.

http://jsoup.org/

Here are some related articles & sample code:

这篇关于使用冷灌注进行屏幕刮擦的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆