Python 中的 BeautifulSoup - 获取类型的第 n 个标签 [英] BeautifulSoup in Python - getting the n-th tag of a type

查看:16
本文介绍了Python 中的 BeautifulSoup - 获取类型的第 n 个标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些包含许多

的 html 代码.

I have some html code that contains many <table>s in it.

我正在尝试获取第二个表中的信息.有没有办法在不使用 soup.findAll('table') 的情况下做到这一点?

I'm trying to get the information in the second table. Is there a way to do this without using soup.findAll('table') ?

当我使用 soup.findAll('table') 时,出现错误:

When I do use soup.findAll('table'), I get an error:

ValueError: too many values to unpack

有没有办法以某种代码或另一种不需要遍历所有表的方式获取第 n 个标签?或者我应该看看我是否可以为表格添加标题?(比如

)

Is there a way to get the n-th tag in some code or another way that does not require going through all the tables? Or should I see if I can add titles to the tables? (like <table title="things">)

如果有帮助,每个表格上方还有标题(

title

).

There are also headers (<h4>title</h4>) above each table, if that helps.

谢谢.

编辑

当我问这个问题时,我的想法是这样的:

Here's what I was thinking when I asked the question:

我将对象解包为两个值,但还有更多.我认为这只会给我列表中的前两件事,但当然,它一直给我上面提到的错误.我不知道返回值是一个列表,并认为它是一个特殊的对象或其他东西,我的代码基于我朋友的.

I was unpacking the objects into two values, when there were many more. I thought this would just give me the first two things from the list, but of course, it kept giving me the error mentioned above. I was unaware the return value was a list and thought it was a special object or something and I was basing my code off of my friends'.

我认为这个错误意味着页面上的表格太多,无法处理所有表格,所以我想寻求一种不用我使用的方法就能做到的方法.我可能应该停止假设.

I was thinking this error meant there were too many tables on the page and that it couldn't handle all of them, so I was asking for a way to do it without the method I was using. I probably should have stopped assuming things.

现在我知道它返回一个列表,我可以在 for 循环中使用它或使用 soup.findAll('table')[someNumber] 从中获取一个值.我还了解了拆包是什么以及如何使用它.感谢所有帮助过的人.

Now I know it returns a list and I can use this in a for loop or get a value from it with soup.findAll('table')[someNumber]. I learned what unpacking was and how to use it, as well. Thanks everyone who helped.

希望这能让事情变得清晰,现在我知道我在做什么,我的问题比我提出问题时更没有意义,所以我想我只是在这里记录一下我的想法.

Hopefully that clears things up, now that I know what I'm doing my question makes less sense than it did when I asked it, so I thought I'd just put a note here on what I was thinking.

编辑 2:

这个问题现在已经很老了,但我仍然看到我从来没有真正清楚自己在做什么.

This question is now pretty old, but I still see that I was never really clear about what I was doing.

如果它对任何人有帮助,我试图解压 findAll(...) 结果,其中的数量我不知道.

If it helps anyone, I was attempting to unpack the findAll(...) results, of which the amount of them I didn't know.

useless_table, table_i_want, another_useless_table = soup.findAll("table");

由于页面中的表格数量并不总是我猜到的,并且元组中的所有值都需要解包,所以我收到了ValueError:

Since there weren't always the amount of tables I had guessed in the page, and all the values in the tuple need to be unpacked, I was receiving the ValueError:

ValueError: too many values to unpack

因此,我一直在寻找获取返回的元组中的第二个(或任何一个索引)表的方法,而不会遇到有关使用了多少表的错误.

So, I was looking for the way to grab the second (or whichever index) table in the tuple returned without running into errors about how many tables were used.

推荐答案

从调用soup.findAll('table')中获取第二个表,将其用作列表,只需对其进行索引:

To get the second table from the call soup.findAll('table'), use it as a list, just index it:

secondtable = soup.findAll('table')[1]

这篇关于Python 中的 BeautifulSoup - 获取类型的第 n 个标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆