从组合框选择后从网页获取href [英] Fetch href from webpage after selecting from combobox

查看:79
本文介绍了从组合框选择后从网页获取href的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从 https://beacon.schneidercorp.com/中抓取数据并需要实现:

I'm trying to scrape data from "https://beacon.schneidercorp.com/" and need to achieve:


  1. 设置 Iowa在州组合框和爱荷华州阿代尔县上在县/市/区组合框中

  2. 带属性搜索按钮

  3. 单击属性搜索按钮并转到下一页

毕竟,浏览器将转到 https://beacon.schneidercorp.com/Application.aspx?AppID=1034&LayerID=22042&PageTypeID=2& ; PageID = 9328这是我的主要目标。

After all this, the browser gets to "https://beacon.schneidercorp.com/Application.aspx?AppID=1034&LayerID=22042&PageTypeID=2&PageID=9328" which is my main goal.

我填满了组合框(tagname = option),但接下来出现了问题:

I filled the comboboxes (tagname="option") but the next problems came up:

a。我要单击以转到下一页的属性搜索,直到我亲自单击并在县/市/区组合框上选择一个选项后才弹出

a. The Property Search I want to click to get to the next page, doesn't pop up until I physically click and select one option on the County/city/area combobox

这是常规操作填充组合框

This is the routine that fills the comboboxes

Sub extraccionCondados2()
   Dim IE As New SHDocVw.InternetExplorer
   Dim htmlDoc As MSHTML.HTMLDocument
   Dim htmlElementos As MSHTML.IHTMLElementCollection
   Dim htmlElemento As MSHTML.IHTMLElement
   
   IE.Visible = True
   IE.navigate "https://beacon.schneidercorp.com/"
    
   Do While IE.readyState <> READYSTATE_COMPLETE
      DoEvents
   Loop
   
   Set htmlDoc = IE.document
   Set htmlElementos = htmlDoc.getElementsByClassName("form-control input-lg")
   htmlElementos(0).Value = "Iowa" 'POPULATES THE STATE COMBOBOX
   htmlElementos(1).Value = "1034" 'POPULATES THE COUNTY/CITY/AREA WITH THE RIGHT VALUE
   htmlElementos(1).Click 'IN THIS CASE THIS LINE DOESN'T DO ANYTHING
   'I'VE TRIED WORKING WITH htmlElementos CHILDREN BUT DIDN'T FIND A WAY TO DO IT
End Sub

b。在将属性搜索显示到视图中之前,我要查找的href不会出现

b. The href I'm looking for doesn't come up until the Property Search is brought to the view

id = quickstartList在显示属性搜索之前为空

The id="quickstartList" is empty before the Property Search is shown

id =" quickstartList"显示属性搜索后有了新孩子,并且有了我的目标URL

The id="quickstartList" got new children after the Property Search is shown and has my target URL

我如何携带还是属性搜索按钮,或者更好的是,获取第二张图像上的href?

How do I bring the Property Search button, or better, fetch the href on the second image?

推荐答案

关于使用MSXML2.ServerHTTP对象的一些建议

Some advice on using MSXML2.ServerHTTP objects to automate web-scraping using your target website as an example.

首先,您可以进入这样的问题所需的页面:

Firstly, you can get to the page you wanted in the question like this:

Sub Example1()

Dim con As New MSXML2.ServerXMLHTTP60 ' A web request object - must add project reference to "Microsoft XML, V6.0" in Tools > References

    ' Opens a new GET request (no hidden info) for the url
    con.Open "GET", "https://beacon.schneidercorp.com/Application.aspx?AppID=1034&PageTypeID=2"
    con.setRequestHeader "Content-type", "application/x-www-form-urlencoded" ' set a standard content-type for the request
    con.send searchBody ' Send the request

    MsgBox con.responseText

End Sub

请注意,URL中只需要为Adair县添加 AppID = 1034 PageTypeID = 2 用于属性搜索(我认为pagetypeId 1是map)。只需查看HTML,您就可以从主页获取AppID的完整列表(我想您已经找到了实现方法)。 MsgBox只是显示 con 对象已将响应作为html文档返回。

Note in the URL I've only had to include AppID=1034 for Adair county and PageTypeID=2 for property search (I think pagetypeId 1 was map). You can get the full list of AppID from the main page just by looking at the HTML (I guess you've figured out how to do this already). The MsgBox just shows that the con object has returned the response as an html document.

在处理项目时并帮助调试和查看html,如果您想随意查看请求的任何响应,我可以使用以下函数将字符串另存为文本文件:

While working on your project and to help debug and look at html, if you want to view any response from a request at leisure, I use the below function to save a string as a text file:

Sub WriteToFile(s As String, n As String)
Dim fso As Object
Set fso = CreateObject("Scripting.FileSystemObject")
Dim oFile As Object
Set oFile = fso.CreateTextFile(n)
oFile.WriteLine s
oFile.Close
Set fso = Nothing
Set oFile = Nothing
End Sub

因此对于上面的代码,我将在最后调用该函数以将响应保存为可以使用notepad ++查看为HTML的文本文件。您也可以直接在F12开发工具中查看html,而无需保存它。

So for the above code I'd call that function at the end to save my response as text files which I can view as HTML using notepad++. You can just view the html in the F12 dev tool too without saving it.

我还在 HTML文档

I've also included below an HTMLdocument object, which I put the response into.

Sub Example2()

Dim con As New MSXML2.ServerXMLHTTP60 ' A web request object - must add project reference to "Microsoft XML, V6.0" in Tools > References
Dim html As New HTMLDocument ' An html document to hold responses, used to parse info - add reference to "Microsoft HTML Object Library"

    ' Opens a new GET request (no hidden info) for the url
    con.Open "GET", "https://beacon.schneidercorp.com/Application.aspx?AppID=1034&PageTypeID=2"
    con.setRequestHeader "Content-type", "application/x-www-form-urlencoded" ' set a standard content-type for the request
    con.send searchBody ' Send the request

    WriteToFile con.responseText, "C:\Users\JamHeadArt\Documents\responseText.txt"
    html.body.innerhtml = con.responseBody

End Sub

填充 html 文档后,您可以使用 getElementByID 来帮助解析结果等。这只是XML的另一种形式,因此您可以遍历节点并通过子/父关系找到事物。

With the html document populated, you can then use things like getElementByID to help parse results etc. It's just another form of XML so you can traverse nodes and find things by child/parent relationships etc.

使用F12开发工具

我可以在网络下使用F12开发人员工具找出这些东西。点击搜索按钮或其他操作之前,只需清除网络流量,然后单击搜索时,就会看到一堆请求。第一个通常是您要签出并基本上模仿的(其余请求将是javascript触发,css,图像,常规内容)。任何请求都有一个URL,有时是一个BODY(如果它是一个发布请求)。

I can figure out this stuff using the F12 developer tool, under network. Before clicking a search button or whatever, just clear the network traffic and then when you click a search, you'll see a bunch of requests. The first one is usually the one you want to check out and basically mimic (the rest of the requests will be javascript firing, css, images, general stuff). Any request has a URL and sometimes a BODY if it's a post request.

如果不花太多时间,您通常可以跳过一大堆搜索步骤和页面,然后通过了解最终搜索的结构和参数,从字面上仅打一个电话到网站,直接将返回信息解析到Excel中,从而获得所需的信息。

Without going in to TOO much detail, you can usually skip a whole bunch of search steps and pages, and get the info you need by knowing the structure and parameters of that final search, making literally one call to the website, with the return info parsed directly into Excel. No browsers used, much much faster.

选择爱荷华州后,您是否在下拉菜单中找到了下拉列表的html具有所有选项值的html?

After selecting Iowa, did you find the html for the drop down list in the html that has all the option values?

<optgroup label="Iowa">
    <option value="1034">Adair County,  IA</option>
    <option value="78">Allamakee County, IA</option>
    <option value="165">Ames, IA</option>
    <option value="96">Audubon County, IA</option>
    <option value="83">Benton County, IA</option>
    <option value="84">Boone County, IA</option>
    <option value="330">Bremer County, IA</option>
    <option value="1015">Buena Vista County,  IA</option>
    <option value="215">Cass County, IA</option>
    <option value="408">Cerro Gordo County, IA</option>
    <option value="501">Cherokee County, IA</option>
    <option value="47">Chickasaw County, IA</option>
    <option value="29">City of Ames, IA - Traffic Accident Database</option>
    <option value="933">City of Cascade, IA</option>
    <option value="516">City of Estherville, IA</option>
    <option value="1061">City of Sigourney, IA</option>
    <option value="1043">Clay County,  IA</option>
    <option value="227">Clayton County, IA</option>
    <option value="375">Clinton County, IA</option>
    <option value="909">Dallas County,  IA</option>
    <option value="49">Davis County, IA</option>
    <option value="72">Delaware County, IA</option>
    <option value="376">Dickinson County, IA</option>
    <option value="93">Dubuque County, IA</option>
    <option value="15">Emmet County, IA</option>
    <option value="79">Fayette County, IA</option>
    <option value="82">Floyd County, IA</option>
    <option value="150">Franklin County, IA</option>
    <option value="825">Fremont County,  IA</option>
    <option value="1064">Greene County,  IA</option>
    <option value="3">Grundy County, IA</option>
    <option value="395">Guthrie County, IA</option>
    <option value="140">Hardin County, IA</option>
    <option value="44">Harrison County, IA</option>
    <option value="60">Henry County, IA</option>
    <option value="617">Humboldt County, IA</option>
    <option value="80">Jackson County, IA</option>
    <option value="325">Jasper County, IA</option>
    <option value="1037">Jefferson County,  IA</option>
    <option value="86">Johnson County, IA</option>
    <option value="164">Jones County, IA</option>
    <option value="81">Keokuk County, IA</option>
    <option value="177">Lee County, IA</option>
    <option value="54">Louisa County, IA</option>
    <option value="594">Lyon County, IA</option>
    <option value="406">Madison County, IA</option>
    <option value="25">Mahaska County, IA</option>
    <option value="70">Marion County, IA</option>
    <option value="1026">Marshall County,  IA</option>
    <option value="410">Mason City, IA</option>
    <option value="153">Mills County, IA</option>
    <option value="929">Mitchell County,  IA</option>
    <option value="21">Montgomery County, IA</option>
    <option value="12">Muscatine Area Geographic Information Consortium (MAGIC)</option>
    <option value="331">O'Brien County, IA</option>
    <option value="611">Osceola County, IA</option>
    <option value="220">Page County, IA</option>
    <option value="218">Palo Alto County, IA</option>
    <option value="1012">Plymouth County,  IA</option>
    <option value="144">Pocahontas County, IA</option>
    <option value="135">Poweshiek County, IA</option>
    <option value="508">Ringgold County, IA</option>
    <option value="75">Sac County, IA</option>
    <option value="1024">Scott County / City of Davenport, Iowa</option>
    <option value="11">Shelby County, IA</option>
    <option value="10">Sioux City, IA</option>
    <option value="984">Sioux County,  IA</option>
    <option value="165">Story County, IA / City of Ames</option>
    <option value="225">Union County, IA</option>
    <option value="595">Wapello County, IA</option>
    <option value="9">Warren County, IA</option>
    <option value="1036">Washington County,  IA</option>
    <option value="723">Webster County, IA</option>
    <option value="73">Winnebago County, IA</option>
    <option value="110">Winneshiek County, IA</option>
    <option value="10">Woodbury County, IA / Sioux City</option>
    <option value="588">Worth County, IA</option>
    <option value="399">Wright County, IA</option>
</optgroup>

这篇关于从组合框选择后从网页获取href的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆