使用Python和BeautifulSoup解析表 [英] Using Python and BeautifulSoup to Parse a Table

查看:109
本文介绍了使用Python和BeautifulSoup解析表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图访​​问与Python和BeautifulSoup一定TD标签的内容。我可以拿到第一td标签符合条件(与find),或全部(带的findAll)。

现在,我可以只使用的findAll,让他们所有,并得到了我想要从他们的内容,但看来似乎是低效率的(即使我把限制在搜索)。反正是有一定要去td标签符合我想要的标准是什么?说第三个,还是10号?

下面是我的code迄今:

 从__future__进口部
从__future__进口UNI code_literals
从__future__进口print_function
从机械化导入浏览器
从BeautifulSoup进口BeautifulSoupBR =浏览器()
URL =htt​​p://finance.yahoo.com/q/ks?s=goog+Key+Statistics
页= br.open(URL)
HTML = page.read()
汤= BeautifulSoup(HTML)
TD = soup.findAll(TD,{'类':'yfnc_tablehead1'})对于x的范围(LEN(TD)):
    VAR1 = TD [X]
    VAR2 = var1.contents [0]
    打印(VAR2)


解决方案

找到的findAll 是非常灵活的,在<一个href=\"http://www.crummy.com/software/BeautifulSoup/documentation.html#The%20basic%20find%20method%3a%20findAll%28name,%20attrs,%20recursive,%20text,%20limit,%20%2a%2akwargs%29\"相对=nofollow> BeautifulSoup.findAll 文档说


  

5。你可以通过在一个可调用对象
  这需要一个标签对象作为其唯一
  参数,返回一个布尔值。一切
  标记对象的findAll遭遇
  将被传递到该对象,
  如果调用返回true,则标签
  被认为是匹配的。


I am trying to access content in certain td tags with Python and BeautifulSoup. I can either get the first td tag meeting the criteria (with find), or all of them (with findAll).

Now, I could just use findAll, get them all, and get the content I want out of them, but that seems like it is inefficient (even if I put limits on the search). Is there anyway to go to a certain td tag meeting the criteria I want? Say the third, or the 10th?

Here's my code so far:

from __future__ import division
from __future__ import unicode_literals
from __future__ import print_function
from mechanize import Browser
from BeautifulSoup import BeautifulSoup

br = Browser()
url = "http://finance.yahoo.com/q/ks?s=goog+Key+Statistics"
page = br.open(url)
html = page.read()
soup = BeautifulSoup(html)
td = soup.findAll("td", {'class': 'yfnc_tablehead1'})

for x in range(len(td)):
    var1 = td[x]
    var2 = var1.contents[0]
    print(var2)

解决方案

find and findAll are very flexible, the BeautifulSoup.findAll docs say

5. You can pass in a callable object which takes a Tag object as its only argument, and returns a boolean. Every Tag object that findAll encounters will be passed into this object, and if the call returns True then the tag is considered to match.

这篇关于使用Python和BeautifulSoup解析表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆