如何获取这些数据 [英] how to get at this data

查看：125 发布时间：2018/6/29 14:43:36 html excel-vba web-scraping

本文介绍了如何获取这些数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找从下面的html示例中突出显示和接界的三个项目。我还突出显示了一些看起来很有用的标记。

你会怎么做？

强大>

好吧，这不是一个很好的问题，我真的很惊讶，它没有得到更多的投票。噢，这里有一些别人的面包屑。

我想要的四个信息中的三个是具有已知id的span元素的内部文本（即，yfs_l10_gm150220c00036500为0.83美元，所以我下面的帮助类似乎是一个不错的直接镜头：

'' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' GetSpanTextForId ''从传入的id'param doc：源htmlDocument'返回来自span元素的内部文本'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' ''''函数GetSpanTextForId（ByRef doc As HTMLDocument，ByVal spanId As String）As Error'错误处理错误GoTo ErrHandler Dim sRoutine As String sRoutine = cModule& .GetSpanTextForIdCheckArgNotNothing doc，docCheckArgNotBadString spanId，spanId'Procedure Dim oSpan As HTMLSpanElement Set oSpan = doc.getElementById（spanId）Check not oSpan Is Nothing，找不到包含id的span：& Bracket（spanId）GetSpanTextForId = oSpan.innerText退出FunctionErrHandler：选择Case DspErrMsg（sRoutine）Case Is = vbAbort：Stop：恢复'调试模式 - 跟踪案例Is = vbRetry：恢复'再试一次Case Is = vbIgnore：'结束例程结束SelectEnd函数

OpenInterest它是具有ID的元素的第二个子元素的表的一部分。以下方法返回紧跟在单元格后面的单元格，并显示我想要的文本（即开放兴趣）

'' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' GetOpenInterest''最新的开放兴趣。''param doc：源HTMLDocument'''' '''''''函数GetOpenInterest（ByRef do c作为HTMLDocument）作为整数Dim tbl作为IHTMLTable Set tbl = GetSummaryDataTable（doc，1）Dim k As Integer k = mWebScrapeHelpers.GetCellNumberForTextStartingWith（tbl，Open Interest：）GetOpenInterest = CInt（mWebScrapeHelpers.GetCellTextFromCellNumber（tbl，k + 1）））End FunctionFunction GetCellNumberForTextStartingWith（ByRef tbl As IHTMLTable，ByRef s As String）As Integer'错误处理错误GoTo ErrHandler Dim sRoutine As String sRoutine = cModule& .GetCellNumberForTextStartingWithCheckArgNotNothing tbl，tbl'Procedure Dim tblCell As HTMLTableCell Dim k As Integer for each tblCell in tbl.Cells if tblCell.innerText Like（*& s）Then GetCellNumberForTextStartingWith = k Exit Function End If if = k + 1接下来'如果我们到了这里就找不到它了GetCellNumberForTextStartingWith = -1退出FunctionErrHandler：选择Case DspErrMsg（sRoutine）Case Is = vbAbort：Stop：恢复'调试模式 - 跟踪案例是= vbRetry：恢复'再试一次案例Is = vbIgnore：'结束例程结束选择结束函数函数GetCellTextFromCellNumber（ByRef tbl作为IHTMLTable，ByRef nbr作为整数）作为字符串'错误处理错误GoTo ErrHandler Dim sRoutine As String sRoutine = cModule& .GetCellNumberForTextStartingWithCheckArgNotNothing tbl，tblCheck tbl.Cells.Length> 0，table is emptyCheck tbl.Cells.Length> = nbr，table only has& tbl.Cells.Length& 细胞;无法获得细胞数目& nbr'过程GetCellTextFromCellNumber = tbl.Cells（nbr）.innerText退出FunctionErrHandler：选择Case DspErrMsg（sRoutine）Case Is = vbAbort：Stop：恢复'调试模式 - 跟踪案例是= vbRetry：恢复'再试一次Case Is = vbIgnore：' End routine End EndEnd Function

这些方法工作正常，有很多不同的方法可行，其中包括建议作为答案的正则表达式解析方法。 RedShift的优秀链接更多地分析了html并提出了一个策略。

干杯

解决方案

我可能会使用XML解析器来获得首先是文本内容（或者：xmlString.replace（/< [^>] +> / g，）用空字符串替换所有标签），然后使用以下正则表达式提取所需的信息：
/ - OPR \s +（\ d + \.\d +）/ / Bid：\ s + （\d + \.\d +）/ /Ask:\s+(\d+\.\d+)/ /开启关键词：\s +（\d +， \ d +）/
这个过程可以通过nodejs（

现场演示：

等待1秒，然后移除标签。，然后查找所有模式并创建一个表。
=falsedata-console =falsedata-babel =fal se <>

wait = true; //设置为false来执行instant.var elem = document.getElementById（parsingStuff）; var str = elem.textContent; var keywords = [-OPR，Bid：，Ask：，Open Interest ：]; VAR输出= {}; VAR超时= 0;如果（等待）超时= 1000;的setTimeout（函数（）{//删除标签elem.innerHTML = elem.textContent;}，超时）;如果（等待）超时= 2000;的setTimeout（函数（）{//寻找模式为（VAR I = 0; I< keywords.length;我++）{输出[关键字[I] = str.match（正则表达式（关键字[ i] +\\s +（\\d + [\\。，] \\d +）））[1];} //创建找到的数据的基本表elem.innerHTML = ; var table = document.createElement（table）; for（k in output）{var tr = document.createElement（tr）; var th = document.createElement（th）; var td = document .createElement（td）; th.style.border =1px solid grey; td.style.border =1px solid grey; th.textContent = k; td.textContent = output [k]; tr.appendChild （th）; tr.appendChild（td） ; table.appendChild（tr）;} elem.appendChild（table）;}，timeout）;

< div id =parsingStuff> < div class =yfi_rt_quote_summaryid =yfi_rt_quote_summary> < div class =hd> < div class =title> < h2> GM Feb 2015 36.500电话（GM150220C00036500）< / h2> < span class =rtq_exch> < span class =rtq_dash> - < / span> OPR< / span> < span class =wl_sign>< / span> < / DIV> < / DIV> < div class =yfi_rt_quote_summary_rt_top sigfig_promo_1> < DIV> < span class =time_rtq_ticker> < span id =yfs_110_gm150220c00036500> 0.83< / span> < /跨度> < / DIV> < / div>未定义< / div>未定义< div class =yui-u first yfi-start-content> < div class =yfi_quote_summary> < div id =yfi_quote_summary_dataclass =rtq_table> < table id =table1> < TR> < th scope =rowwidth =48％>出价：< / th> < td class =yfnc_tabledata1> < span id =yfs_b00_gm150220c00036500> 0.76< / span> < / TD> < / TR> < TR> < th scope =rowwidth =48％>问：< / th> < td class =yfnc_tabledata1> < span id =yfs_a00_gm150220c00036500> 0.90< / span> < / TD> < / TR> < /表> < table id =table2> < TR> < th scope =rowwidth =48％>未平仓合约：< / th> < td class =yfnc_tabledata1> 11,579< / td> < / TR> < /表> < / DIV> < / DIV> < / div>< / div>

I am looking to scrape the three items that are highlighted and bordered from the html sample below. I've also highlighted a few markers that look useful.

查看全文

如何获取这些数据 [英] how to get at this data

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

如何获取这些数据 [英] how to get at this data

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭