使用BeautifulSoup在python中使用img标签解析表 [英] parsing tables with img tags in python with BeautifulSoup

查看：427 发布时间：2020/9/20 8:38:51 python html-parsing beautifulsoup

本文介绍了使用BeautifulSoup在python中使用img标签解析表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 BeautifulSoup 来解析html页面.我需要处理页面中的第一个表.该表包含几行.然后，每一行都包含一些"td"标签，其中"td"标签之一具有"img"标签.我想获取该表中的所有信息.但是，如果我打印该表，则不会获得与'img'标签有关的任何数据.

I am using BeautifulSoup to parse an html page. I need to work on the first table in the page. That table contains a few rows. Each row then contains some 'td' tags and one of the 'td' tags has an 'img' tag. I want to get all the information in that table. But if I print that table I don't get any data related to the 'img' tag.

我正在使用soap.findAll("table")来获取所有表，然后选择第一个表进行处理. html看起来像这样:

I am using soap.findAll("table") to get all the tables then chose the first table for processing. The html looks something like this:

<table id="abc"
  <tr class="listitem-even">
    <td class="listitem-even">
      <table border = "0"> <tr> <td class="gridcell">
               <img id="img_id" title="img_title" src="img_src" alt="img_alt" /> </td> </tr>
      </table>
    </td>
    <td class="listitem-even"
      <span>some_other_information</span>
    </td>
  </tr>
</table>

如何获取表中包括'img'标记的所有数据? 谢谢，

How can I get all the data in the table including the 'img' tag ? Thanks,

推荐答案

您有一个嵌套表，因此在解析tr/td/img标签之前，您需要检查您在树中的位置.

You have a nested table, so you need to check where you are in the tree, prior to parsing tr/td/img tags.

from bs4 import BeautifulSoup
f = open('test.html', 'rb')
html = f.read()
f.close()
soup = BeautifulSoup(html)

tables = soup.find_all('table')

for table in tables:
     if table.find_parent("table") is not None:
         for tr in table.find_all('tr'):
                 for td in table.find_all('td'):
                         for img in td.find_all('img'):
                                 print img['id']
                                 print img['src']
                                 print img['title']
                                 print img['alt']

它根据您的示例返回以下内容:

It returns the following based on your example:

img_id
img_src
img_title
img_alt

这篇关于使用BeautifulSoup在python中使用img标签解析表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用BeautifulSoup在python中使用img标签解析表 [英] parsing tables with img tags in python with BeautifulSoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用BeautifulSoup在python中使用img标签解析表 [英] parsing tables with img tags in python with BeautifulSoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭