Python的BeautifulSoup从表中获取列 - IndexError列表索引超出范围 [英] Python BeautifulSoup Getting a column from table - IndexError List index out of range

查看：1485 发布时间：2016/8/5 19:01:09 python html-parsing beautifulsoup findall

本文介绍了Python的BeautifulSoup从表中获取列 - IndexError列表索引超出范围的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Python的新手在这里。 Python 2.7版与beautifulsoup 4。

Python newbie here. Python 2.7 with beautifulsoup 4.

我想获得解析网页使用BeautifulSoup得到列。该网页有表内表;但表4是我想要的，它没有任何标题或个标签。我想数据进入列。

I am trying to get parse a webpage to get columns using BeautifulSoup. The webpage has tables inside tables; but table 4 is the one that I want, it does not have any headers or th tag. I want to get the data into column.

from bs4 import BeautifulSoup
import urllib2

url = 'http://finance.yahoo.com/q/op?s=aapl+Options'
htmltext = urllib2.urlopen(url).read()
soup = BeautifulSoup(htmltext)

#Table 8 has the data needed; it is nested under other tables though
# specific reference works as below:
print soup.findAll('table')[8].findAll('tr')[2].findAll('td')[2].contents

# Below loop erros out:
for row in soup.findAll('table')[8].findAll('tr'):
    column2 = row.findAll('td')[2].contents
    print column2

# "Index error: list index out of range" is what I get on second line of for loop.

我认为这是在另一个例子一个有效的解决方案，但对我没有工作。也试过围绕TR迭代：

I saw this as a working solution in another example but didnt work for me. Also tried iterating around tr:

mytr = soup.findAll('table')[8].findAll('tr')

for row in mytr:
    print row.find('td') #works but gives only first td as expected
    print row.findAll('td')[2]

这给出了一个错误该行是一个列表，它是出指数。

which gives an error that row is a list which is out of index.

所以：

第一的findAll（'表'） - 作品

第二的findAll（TR） - 的作品

第三的findAll（'TD'） - 仅如果所有的[]是数字，而不是变量

例如。

print soup.findAll('table')[8].findAll('tr')[2].findAll('td')[2].contents

以上的作品，因为它是具体的参考，但不是通过变量。
但我需要它在循环中得到充分的列。

Above works as it is specific reference but not through variables. But I need it inside a loop to get full column.

推荐答案

我接过来一看，表中的第一行实际上是一个标题所以在第一个 TR 有一些个，这应该工作：

I took a look, first row in the table is actually a header so under the first tr there are some th, this should work:

>>> mytr = soup.findAll('table')[9].findAll('tr')
>>> for i,row in enumerate(mytr):
...     if i:
...         print i,row.findAll('td')[2]

在HTML解析大多数情况下，可以考虑如XML和XPath一个更优雅的解决方案，如：

as in most cases of html parsing, consider a more elegant solution like xml and xpath, like:

>>> from lxml import html
>>> print html.parse(url).xpath('//table[@class="yfnc_datamodoutline1"]//td[2]')

这篇关于Python的BeautifulSoup从表中获取列 - IndexError列表索引超出范围的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python的BeautifulSoup从表中获取列 - IndexError列表索引超出范围 [英] Python BeautifulSoup Getting a column from table - IndexError List index out of range

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python的BeautifulSoup从表中获取列 - IndexError列表索引超出范围 [英] Python BeautifulSoup Getting a column from table - IndexError List index out of range

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭