你如何让所有使用BeautifulSoup一个特定表的行? [英] How do you get all the rows from a particular table using BeautifulSoup?

查看:170
本文介绍了你如何让所有使用BeautifulSoup一个特定表的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我学习Python和BeautifulSoup从网上抽取数据,并阅读HTML表格。我可以读入的Open Office和它说,它是表#11。

I am learning Python and BeautifulSoup to scrape data from the web, and read a HTML table. I can read it into Open Office and it says that it is Table #11.

好像BeautifulSoup是preferred的选择,但谁能告诉我如何抓住特定的表和所有行?我已经看过模块文档,但不能让我的头周围。很多的,我在网​​上找到的例子似乎做的比我更需要。

It seems like BeautifulSoup is the preferred choice, but can anyone tell me how to grab a particular table and all the rows? I have looked at the module documentation, but can't get my head around it. Many of the examples that I have found online appear to do more than I need.

推荐答案

这应该是pretty如果你有HTML的一大块用BeautifulSoup解析直线前进。总的想法是使用 findChildren 方法导航到你的表,那么你可以使用字符串的单元格中的文本值属性。

This should be pretty straight forward if you have a chunk of HTML to parse with BeautifulSoup. The general idea is to navigate to your table using the findChildren method, then you can get the text value inside the cell with the string property.

>>> from BeautifulSoup import BeautifulSoup
>>> 
>>> html = """
... <html>
... <body>
...     <table>
...         <th><td>column 1</td><td>column 2</td></th>
...         <tr><td>value 1</td><td>value 2</td></tr>
...     </table>
... </body>
... </html>
... """
>>>
>>> soup = BeautifulSoup(html)
>>> tables = soup.findChildren('table')
>>>
>>> # This will get the first (and only) table. Your page may have more.
>>> my_table = tables[0]
>>>
>>> # You can find children with multiple tags by passing a list of strings
>>> rows = my_table.findChildren(['th', 'tr'])
>>>
>>> for row in rows:
...     cells = row.findChildren('td')
...     for cell in cells:
...         value = cell.string
...         print "The value in this cell is %s" % value
... 
The value in this cell is column 1
The value in this cell is column 2
The value in this cell is value 1
The value in this cell is value 2
>>> 

这篇关于你如何让所有使用BeautifulSoup一个特定表的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆