Python的 - 在BeautifulSoup嵌套标签 [英] Nested tags in BeautifulSoup - Python

查看：869 发布时间：2016/8/5 19:08:27 python beautifulsoup

本文介绍了Python的 - 在BeautifulSoup嵌套标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我看网站上和计算器的例子很多，但我无法找到一个通用的解决方案，以我的问题。我处理一个非常凌乱的网站，我想凑一些数据。标记看起来像这样：

I've looked at many examples on websites and on stackoverflow but I couldn't find a universal solution to my question. I'm dealing with a really messy website and I'd like to scrape some data. The markup looks like so:

...
<body>
...
    <table>
        <tbody>
            <tr>
            ...
            </tr>
            <tr>
                <td>
                ...
                </td>
                <td>
                    <table>
                        <tr>
                        ...
                        </tr>
                        <tr>
                            <td>
                                <a href="...">Some link</a>
                                <a href="...">Some link</a>
                                <a href="...">Some link</a>
                            </td>
                        </tr>
                    </table>
                </td>
            </tr>
        </tbody>
    </table>
</body>

我遇到的问题是，没有任何元素都有，我可以选择来缩小范围的一些属性。在每个的...有可能是类似的标记，例如多个＆LT; A＆GT;的 ＆LT;表＆gt; 和诸如此类的东西。

我知道 TR表表TR TD一是独一无二的，我需要的联系，但如何将BeautifulSoup抢那些？我不知道怎么抢嵌套的标签没有做一堆code的各行的。

I know that table tr table tr td a is unique to the links I need, but how would BeautifulSoup grab those? I'm not sure how grab nested tags without doing a bunch of individual lines of code.

任何帮助吗？

推荐答案

您可以使用的 CSS选择器>选择：

You can use CSS selectors in select:

soup.select('table tr table tr td a')

In [32]: bs4.BeautifulSoup(urllib.urlopen('http://google.com/?hl=en').read()).select('#footer a')
Out[32]:
[<a href="/intl/en/ads/">Advertising Programs</a>,
 <a href="/services/">Business Solutions</a>,
 <a href="https://plus.google.com/116899029375914044550" rel="publisher">+Google</a>,
 <a href="/intl/en/about.html">About Google</a>,
 <a href="http://www.google.com/setprefdomain?prefdom=RU&amp;prev=http://www.google.ru/&amp;sig=0_3F2sRGWVktTCOFLA955Vr-AWlHo%3D">Google.ru</a>,
 <a href="/intl/en/policies/">Privacy &amp; Terms</a>]

这篇关于Python的 - 在BeautifulSoup嵌套标签的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python的 - 在BeautifulSoup嵌套标签 [英] Nested tags in BeautifulSoup - Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python的 - 在BeautifulSoup嵌套标签 [英] Nested tags in BeautifulSoup - Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭