网页抓取HTML表使用Python [英] Web Scraping HTML Table Using Python

查看：178 发布时间：2016/8/5 19:09:17 python for-loop web-scraping beautifulsoup html-table

本文介绍了网页抓取HTML表使用Python的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我觉得我真的很接近，所以任何帮助，将AP preciated。试图从标题为股市活动在纳斯达克的网页表格刮索引和值数据：

 高清get_index_prices（NASDAQ_URL）：
    HTML =的urlopen（NASDAQ_URL）.read（）
    汤= BeautifulSoup（HTML，LXML）
    在汤行（'表'，{'类'：'genTable薄'}）[0] .tbody（TR）：
        TDS =行（'TD'）
        打印指数：％S，值：％的％（TDS [0]的.text，TDS [1]的.text）
打印get_index_prices（'http://www.nasdaq.com/'）

错误读取：

列表索引超出范围

解决方案

本表由JavaScript渲染。如果你看看网页的源文件code，JavaScript的运行之前，你可以看到这个表，如：

 ＆LT; DIV ID =HomeIndexTable级=genTable薄＆GT;
    ＆LT;表ID =索引表级=floatL marginB5px＆GT;
        ＆LT;＆THEAD GT;
        ＆所述; TR＆GT;
            ＆LT;第i个指数和LT; /第i
            ＆LT;第i价值与LT; /第i
            ＆LT;第i更改网/％LT; /第i
        ＆LT; / TR＆GT;
        ＆LT; / THEAD＆GT;
        ＆LT;脚本类型=文/ JavaScript的＆GT;
            //＆LT;！[CDATA [                nasdaqHomeIndexChart.storeIndexInfo(\"NASDAQ\",\"5053.75\",\"-20.52\",\"0.40\",\"1,938,573,902\",\"5085.22\",\"5053.75\");
                nasdaqHomeIndexChart.storeIndexInfo（道指，17663.54， -  92.26，0.52，，17799.96，17662.87）;
                nasdaqHomeIndexChart.storeIndexInfo（S＆放大器; P 500指数，2079.36， -  10.05，0.48，，2094.32，2079.34）;
                nasdaqHomeIndexChart.storeIndexInfo（纳斯达克-100，4648.83， -  21.93，0.47，，4681.23，4648.83）;
                nasdaqHomeIndexChart.storeIndexInfo（纳斯达克100 PMI，4675.49，4.73，0.10，，4681.98，4675.49）;
                nasdaqHomeIndexChart.storeIndexInfo（纳斯达克100 AHI，4647.33， -  1.50，0.03，，4670.76，4647.26）;
                nasdaqHomeIndexChart.storeIndexInfo（罗素1000，1153.55， -  4.85，0.42，，1161.51，1153.54）;
                nasdaqHomeIndexChart.storeIndexInfo（罗素2000，1161.86， -  3.76，0.32，，1167.65，1159.66）;
                nasdaqHomeIndexChart.storeIndexInfo（富时全球（美国除外）*，271.15， -  0.23，0.08，，272.33，271.13）;
                nasdaqHomeIndexChart.storeIndexInfo（富时拉菲1000 *，9045.08， -  34.52，0.38，，9109.74，9044.91）;
            //]]＆GT;
            nasdaqHomeIndexChart.displayIndexes（）;
        ＆LT; / SCRIPT＆GT;
    ＆LT; /表＆gt;
＆LT; / DIV＆GT;

您可以使用硒刮。硒可以执行JavaScript的。

I think I'm really close, so any help would be appreciated. Trying to scrape Index and Value data from the table titled "Stock Market Activity" on the homepage of NASDAQ:

def get_index_prices(NASDAQ_URL):
    html = urlopen(NASDAQ_URL).read()    
    soup = BeautifulSoup(html, "lxml")      
    for row in soup('table', {'class': 'genTable thin'})[0].tbody('tr'):
        tds = row('td')
        print "Index: %s, Value: %s" % (tds[0].text, tds[1].text)


print get_index_prices('http://www.nasdaq.com/')

Error reads:

list index out of range

解决方案

This table rendered by javascript. If you look on page source code, before javascript runs, you can see this table like:

<div id="HomeIndexTable" class="genTable thin">
    <table id="indexTable" class="floatL marginB5px">
        <thead>
        <tr>
            <th>Index</th>
            <th>Value</th>
            <th>Change Net / %</th>
        </tr>
        </thead>
        <script type="text/javascript">
            //<![CDATA[

                nasdaqHomeIndexChart.storeIndexInfo("NASDAQ","5053.75","-20.52","0.40","1,938,573,902","5085.22","5053.75");
                nasdaqHomeIndexChart.storeIndexInfo("DJIA","17663.54","-92.26","0.52","","17799.96","17662.87");
                nasdaqHomeIndexChart.storeIndexInfo("S&P 500","2079.36","-10.05","0.48","","2094.32","2079.34");
                nasdaqHomeIndexChart.storeIndexInfo("NASDAQ-100","4648.83","-21.93","0.47","","4681.23","4648.83");
                nasdaqHomeIndexChart.storeIndexInfo("NASDAQ-100 PMI","4675.49","4.73","0.10","","4681.98","4675.49");
                nasdaqHomeIndexChart.storeIndexInfo("NASDAQ-100 AHI","4647.33","-1.50","0.03","","4670.76","4647.26");
                nasdaqHomeIndexChart.storeIndexInfo("Russell 1000","1153.55","-4.85","0.42","","1161.51","1153.54");
                nasdaqHomeIndexChart.storeIndexInfo("Russell 2000","1161.86","-3.76","0.32","","1167.65","1159.66");
                nasdaqHomeIndexChart.storeIndexInfo("FTSE All-World ex-US*","271.15","-0.23","0.08","","272.33","271.13");
                nasdaqHomeIndexChart.storeIndexInfo("FTSE RAFI 1000*","9045.08","-34.52","0.38","","9109.74","9044.91");
            //]]>
            nasdaqHomeIndexChart.displayIndexes();
        </script>
    </table>
</div>

You can use selenium for scraping. Selenium can execute javascript.

这篇关于网页抓取HTML表使用Python的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

网页抓取HTML表使用Python [英] Web Scraping HTML Table Using Python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

网页抓取HTML表使用Python [英] Web Scraping HTML Table Using Python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭