Web爬网-VBA搜索参数无法正常工作 [英] Web Scraping - VBA Search Parameters Not Working Properly

查看:60
本文介绍了Web爬网-VBA搜索参数无法正常工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究一个网络抓取项目,该项目将从旅行网站上抓取票务信息.

I am working on a web scraping project which would scrape ticketing information off a travel website.

我当前遇到的问题是,我的VBA代码中定义的搜索参数以及后来输入到要执行的网站中的搜索参数无法正常工作.下面提供了已编写的代码.为了提供一些背景信息,我正在从Excel工作簿(例如Beijing(北京))中读取往返目的地,并以与网站希望输入的格式相同的格式(MM-DD-YYYY)定义旅行日期.但是,在运行时,该站点似乎无法识别参数,而是将我定向到页面上显示"正在维护的站点".奇怪的是,当我手动输入参数时,该站点会识别出并提供票务信息.

I am currently encountering an issue where the search parameters defined in my VBA code and later input into the website to be executed is not working. The code that has been written is provided below. To provide some background, I am reading in the to/from destinations from my Excel workbook (e.g. Beijing(北京)) and defining the travel date in the same format (MM-DD-YYYY) as the website would expect it to be input. However, when running, the site does not seem to recognize the parameters and directs me to a page saying "site under maintenance". The odd thing is, when I manually input the parameters, the site recognizes it and provides ticketing information.

我也许想念一些东西吗?我是否需要更新"DepartureCity","ArrivalCity"和"DepartDate"之外的其他值?

Am I perhaps missing something? Do I have to update other values outside of "DepartureCity", "ArrivalCity", and "DepartDate"?

我还注意到,当我遍历多个城市时,该网站会搜索与先前定义的参数相同的参数(即,如果搜索上海->北京,则会生成我之前搜索的天津->北京). 是否可以通过VBA自动删除搜索历史记录/缓存?

I also noticed that when I loop through multiple cities, the site searches for the same parameters as previously defined (i.e. if searching Shanghai -> Beijing, it yields Tianjin -> Beijing which I had previously searched for). Is there a way to auto remove the search history/cache via VBA?

' save from and to destinations under a defined string
sFrom = Range("C3").Value
sTo = Range("C4").Value

' "i" to track the # of days out as defined by the user
For i = 0 To cntDays
    dtRange = Date + i

    ' establish date to pull train ticketing information on
    If Len(Day(dtRange)) = 1 Then
        sDay = "0" & Day(dtRange)
    Else:
        sDay = Day(dtRange)
    End If

    If Len(Month(dtRange)) = 1 Then
        sMonth = "0" & Month(dtRange)
    Else:
        sMonth = Month(dtRange)
    End If

    sDate = sMonth & "-" & sDay & "-" & Year(dtRange)

    ' instantiate the oIE object
    Set oIE = CreateObject("InternetExplorer.Application")

    ' open Ctrip travel portal
    sURL = "http://english.ctrip.com/trains/#ctm_ref=nb_tn_top"
    With oIE
        .navigate sURL
        .Visible = True

        Do Until (.readyState = 4 And Not .Busy)
           DoEvents
        Loop

        ' search for particular entry
        .document.getElementsByName("DepartureCity")(0).Value = sFrom
        .document.getElementsByName("ArrivalCity")(0).Value = sTo
        .document.getElementsByName("DepartDate")(0).Value = sDate

        MsgBox sFrom
        MsgBox sTo
        MsgBox sDate

        Set ElementCol = .document.getElementsByTagName("button")
            For Each btnInput In ElementCol
                If btnInput.innerText = "Search" Then
                    btnInput.Click
                    Exit For
                End If
            Next btnInput

        ' ensure page has been fully loaded
        Do Until (.readyState = 4 And Not .Busy)
           DoEvents
        Loop

推荐答案

再仔细一点,站点使用GET请求执行搜索.
因此,无需加载页面,填充字段并单击按钮.
您可以直接在URL中设置值,并绕过初始页面.

Looking at this a little closer, the site uses a GET request to perform the search.
As such, there is no need to load the page, populate the fields, and click the button.
You can set the values in the URL directly and bypass the initial page.

例如,要搜索2015年12月9日从上海到北京的火车,请加载以下网址...

For instance, to search for trains going from Shanghai to Beijing on 12-9-2015, load the following URL...

http://english.ctrip.com/trains/List/Index?DepartureCity=shanghai%28%E4%B8%8A%E6%B5%B7%29&ArrivalCity=beijing%28%E5%8C%97%E4%BA%AC%29& DepartDate = 12-9-2015& DepartureStation =%E4%B8%8A%E6%B5%B7& ArrivalStation =%E5%8C%97%E4%BA%AC

当发生故障时,看起来就是这样...

When broken down looks like this...

http://english.ctrip.com/trains/List/Index ?
DepartureCity = Shanghai%28%E4%B8%8A%E6%B5%B7%29
ArrivalCity = beijing%28%E5%8C%97%E4%BA%AC%29
DepartDate = 12-9-2015
DepartureStation =%E4%B8%8A%E6%B5%B7
ArrivalStation =%E5%8C%97%E4%BA%AC

http://english.ctrip.com/trains/List/Index?
DepartureCity=shanghai%28%E4%B8%8A%E6%B5%B7%29
ArrivalCity=beijing%28%E5%8C%97%E4%BA%AC%29
DepartDate=12-9-2015
DepartureStation=%E4%B8%8A%E6%B5%B7
ArrivalStation=%E5%8C%97%E4%BA%AC

根据我自己的测试,我确定上述每个字段都是必填字段,否则您将获得维护"屏幕...

From my own testing, I've determined that each of the above fields are required or you get the "maintenance" screen...

这意味着您还需要知道站号.

Which means you need to know the station codes as well.

此外,您必须在名称中提供特殊字符...

In addition you must supply the special characters in the names...

上海%28%E4%B8%8A%E6%B5%B7%29

这篇关于Web爬网-VBA搜索参数无法正常工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆