Beautifulsoup功能没有具体塞纳里奥正常工作 [英] Beautifulsoup functionality not working properly in specific senario

查看:227
本文介绍了Beautifulsoup功能没有具体塞纳里奥正常工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用的urllib2在下面的网址为: http://frcwest.com/ 然后搜索对于元重定向的数据。

I am trying to read in the following url using urllib2: http://frcwest.com/ and then search the data for the meta redirect.

据在读取以下数据:

   <!--?xml version="1.0" encoding="UTF-8"?--><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   <html xmlns="http://www.w3.org/1999/xhtml"><head><title></title><meta content="0;url= Home.html" http-equiv="refresh"/></head><body></body></html>

读入Beautifulsoup工作正常。然而,对于没有的功能的一些原因,适用于这一特定senarious,我不明白为什么。 Beautifulsoup已经所有其他情况,我伟大的工作。但是,如果只是想:

Reading it into Beautifulsoup works fine. However for some reason none of the functionality works for this specific senarious, and I don't understand why. Beautifulsoup has worked great for me in all other scenarios. However, when simply trying:

    soup.findAll('meta')

产生任何结果。

我的最终目标是运行:

    soup.find("meta",attrs={"http-equiv":"refresh"})

但是,如果:

    soup.findAll('meta')

甚至没有工作,然后我卡。任何煽动进入这个神秘的是AP preciated,谢谢!

isn't even working then I'm stuck. Any incite into this mystery would be appreciated, thanks!

推荐答案

这是注释,在这里抛出解析器的doctype,随后,BeautifulSoup。

It's the comment and doctype that throws the parser here, and subsequently, BeautifulSoup.

即使是HTML标记,似乎'不见了':

Even the HTML tag seems 'gone':

>>> soup.find('html') is None
True

然而,它的存在在 .contents 迭代依然。你可以再次找到的东西:

Yet it is there in the .contents iterable still. You can find things again with:

for elem in soup:
    if getattr(elem, 'name', None) == u'html':
        soup = elem
        break

soup.find_all('meta')

演示:

>>> for elem in soup:
...     if getattr(elem, 'name', None) == u'html':
...         soup = elem
...         break
... 
>>> soup.find_all('meta')
[<meta content="0;url= Home.html" http-equiv="refresh"/>]

这篇关于Beautifulsoup功能没有具体塞纳里奥正常工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆