从HTML TD和Tr提取值 [英] Extract values from HTML TD and Tr

查看:221
本文介绍了从HTML TD和Tr提取值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些HTML源代码,我从一个网站获得了选项引号。 (请见下文)

I have some HTML source that i get from a website for option quotes. (please see below)

根据执行价格,在tr中提取各种文本值并存储在集合中的最佳方式是什么(在这种情况下可用于4700中间td 4700.00

What is the best way to extract the various text values in tr and store in a collection based on the strike price (4700 in this case available in the mid td 4700.00)

有些人推荐使用正则表达式,而其他建议使用html解析器。我在VBA这样做,最好的方法是什么?

Some people recommend regex while other suggest to use a html parser. I'm doing this in VBA so whats the best way?

<!--<td><a href="javascript:popup1('','','1')">Quote</a></td>

<td><a href="javascript:popup1('','','','','CE')"><img src="/images/print3.gif"></a>



</td>-->





                    <td><a href="javascript:chartPopup('NIFTY', 'OPTIDX', '25JAN2012', '4700.00','CE','S&P CNX NIFTY');"><img src="/live_market/resources/images/grficon.gif" /></a></td>

                        <td class="ylwbg"> 2,935,500</td>

                        <td class="ylwbg"> 27,550</td>

                        <td class="ylwbg"> 12,458</td>


                        <td class="ylwbg"> 23.79</td>

                        <!-- End-->

                        <td class="ylwbg">





                            <a href="/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=NIFTY&instrument=OPTIDX&strike=4700.00&type=CE&expiry=25JAN2012" target="_blank"> 139.25</a>



                        </td>

                        <!--*Net Change*-->



                        <td class="ylwbg" Style="color:Red;"> -7.35</td>



                        <td class="ylwbg"> 200</td>

                        <td class="ylwbg"> 139.15</td>

                        <td class="ylwbg"> 142.45</td>

                        <td class="ylwbg"> 200</td>

                        <td class="grybg"><a href="/live_market/dynaContent/live_watch/option_chain/optionDates.jsp?symbol=NIFTY&instrument=OPTIDX&strike=4700.00"><b>4700.00</b></a></td>

                        <td class="nobg"> 1,300</td>

                        <td class="nobg"> 76.00</td>

                        <td class="nobg"> 79.00</td>

                        <td class="nobg"> 1,350</td>



                        <!--*Net Change*-->



                            <td class="nobg" Style="color:Red;"> -1.55</td>





                        <td class="nobg">



                            <!-- <a href="javascript:popup1('NIFTY','OPTIDX','25JAN2012','4700.00','PE')"> 76.00</a> -->



                            <a href="/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=NIFTY&instrument=OPTIDX&strike=4700.00&type=PE&expiry=25JAN2012" target="_blank"> 76.00</a>







                        </td>



                        <td class="nobg"> 26.33</td>



                        <td class="nobg"> 32,772</td>

                        <td class="nobg"> 103,700</td>



                        <td class="nobg"> 5,123,300</td>



                        <td><a href="javascript:chartPopup('NIFTY', 'OPTIDX', '25JAN2012', '4700.00','PE','S&P CNX NIFTY');"><img src="/live_market/resources/images/grficon.gif" /></a></td>



<!--<td><a href="javascript:popup1('','','1')">Quote</a></td>

<td><a href="javascript:popup1('','','','','PE')"><img src="/images/print3.gif"></a></td>-->



                    </tr>


推荐答案

在一些虚幻之后,我得出了一个正则表达式/ VBA解决方案使用

After some fiddling I have derived a regex/VBA solution using


  1. XMLHTTP访问该网站(更改 strSite 以适应)

  2. 一个正则表达式获取所需的数字

  3. 一个具有20个记录的变量数组,然后将数字转储到活动表单

  1. XMLHTTP to access the site (change strSite to suit)
  2. a Regexp to get the required numbers
  3. a variant array with 20 records to hold, then dump the numbers to the active sheet


查看源HTML以查找正则表达式模式

调用选项有一个共同的起始和终止字符串,用于界定10个值,但有三个不同的字符串

The Call options have a common starting and finishing string that delimit the 10 values, but there are three different strings


  1. 每个记录匹配的字符串1-4,7-10匹配< td class =ylwbg> X < / td>

  2. String 6在 <$ c $之前的> 之前有一个样式(和其他文本) c> X

  3. 字符串5包含更长的< a href文本 X < / a> li>
  1. Strings 1-4,7-10 for each record match <td class="ylwbg">X</td>
  2. String 6 has a Style (and other text) preceding the > before the X
  3. String 5 contains a much longer <a href textX</a>

正则表达式
.Pattern =(< tdclass =ylwbg)样式+?){0,1}>(。+?)(<\ / td>)
提取所有需要的字符串,但需要进一步的工作字符串5

A regex of .Pattern = "(<tdclass=""ylwbg"")(Style.+?){0,1}>(.+?)(<\/td>)" extracts all the needed strings, but further work is needed later on string 5

选项以< td class =nobg所以这些都不是由正则表达式提取的,而这个正则表达式可以得到点1-3。

The Put options start with <td class="nobg" so these are happily not extracted by a regex that gets points 1-3


实际代码

    Sub GetTxt()
    Dim objXmlHTTP As Object
    Dim objRegex As Object
    Dim objRegMC As Object
    Dim objRegM As Object
    Dim strResponse As String
    Dim strSite As String
    Dim lngCnt As Long
    Dim strTemp As String
    Dim X(1 To 20, 1 To 10)
    X(1, 1) = "OI"
    X(1, 2) = "Chng in vol"
    X(1, 3) = "Volume"
    X(1, 4) = "IV"
    X(1, 5) = "LTP"
    X(1, 6) = "Net Chg"
    X(1, 7) = "Bid Qty"
    X(1, 8) = "Bid Price"
    X(1, 9) = "Ask Price"
    X(1, 10) = "Ask Qnty"

    Set objXmlHTTP = CreateObject("MSXML2.XMLHTTP")
    strSite = "http://nseindia.com/live_market/dynaContent/live_watch/option_chain/optionDates.jsp?symbol=NIFTY&instrument=OPTIDX&strike=4700.00"

    On Error GoTo ErrHandler
    With objXmlHTTP
        .Open "GET", strSite, False
        .Send
        If .Status = 200 Then strResponse = .ResponseText
    End With
    On Error GoTo 0

    Set objRegex = CreateObject("vbscript.regexp")
    With objRegex
        '*cleaning regex* to remove all spaces
        .Pattern = "[\xA0\s]+"
        .Global = True
        strResponse = .Replace(strResponse, vbNullString)
        .Pattern = "(<tdclass=""ylwbg"")(Style.+?){0,1}>(.+?)(<\/td>)"
        If .Test(strResponse) Then
            lngCnt = 20
            Set objRegMC = .Execute(strResponse)
            For Each objRegM In objRegMC
                lngCnt = lngCnt + 1
                If Right$(objRegM.submatches(2), 2) <> "a>" Then
                    X(Int((lngCnt - 1) / 10), IIf(lngCnt Mod 10 > 0, lngCnt Mod 10, 10)) = objRegM.submatches(2)
                Else
                'Get submatches of the form <a href="/live_market/dynaContent/live_watch/get_quote/GetQuoteFO.jsp?underlying=NIFTY&instrument=OPTIDX&strike=4700.00&type=CE&expiry=23FEB2012" target="_blank"> 206.40</a>
                    strTemp = Val(Right(objRegM.submatches(2), Len(objRegM.submatches(2)) - InStrRev(objRegM.submatches(2), """") - 1))
                    X(Int((lngCnt - 1) / 10), IIf(lngCnt Mod 10 > 0, lngCnt Mod 10, 10)) = strTemp
                End If
            Next
        Else
            MsgBox "Parsing unsuccessful", vbCritical
        End If
    End With
    Set objRegex = Nothing
    Set objXmlHTTP = Nothing
    [a1].Resize(UBound(X, 1), UBound(X, 2)) = X
    Exit Sub
ErrHandler:
    MsgBox "Site not accessible"
    If Not objXmlHTTP Is Nothing Then Set objXmlHTTP = Nothing
End Sub

这篇关于从HTML TD和Tr提取值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆