如何获得div标签内的所有标签里 [英] How to get all the li tag within div tag

查看：276 发布时间：2016/8/5 19:20:31 python web-scraping beautifulsoup

本文介绍了如何获得div标签内的所有标签里的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我刮了一个网站，让公司和产品的详细信息。
它有div标签，其中有李标签，我想div标签内的所有LI标记。
我使用python 3.5.1和BeautifulSoup

我的code：

 从BS4进口BeautifulSoup
进口urllib.request里
进口重
R = urllib.request.urlopen（'http://i.cantonfair.org.cn/en/ExpExhibitorList.aspx?k=glassware'）
汤= BeautifulSoup（Rhtml.parser）链接= soup.find_all（一，HREF = re.compile（Rexpexhibitorlist \\的.aspx \\？categoryno = [0-9] +））
linksfromcategories =（[链接[HREF]的链接链接]）字符串=http://i.cantonfair.org.cn/en/
linksfromcategories = [字符串+ X在linksfromcategories X]在linksfromcategories链接：
    响应= urllib.request.urlopen（链接）
    soup2 = BeautifulSoup（回应，html.parser）
    links2 = soup2.find_all（一，HREF = re.compile（R\\ ExpExhibitorList \\的.aspx \\？categoryno = [0-9] +））
    linksfromsubcategories =（[链接[HREF]的链接links2]）
    linksfromsubcategories = [字符串+ X在linksfromsubcategories X]
    在linksfromsubcategories链接：
        响应= urllib.request.urlopen（链接）
        soup3 = BeautifulSoup（回应，html.parser）
        体3 = soup3.find_all（一，HREF = re.compile（R\\ ExpExhibitorList \\的.aspx \\？categoryno = [0-9] +））
        linksfromsubcategories2 =（[链接[HREF]在体3链接]）
        linksfromsubcategories2 = [字符串+ X在linksfromsubcategories2 X]
        在linksfromsubcategories2链接：
            响应2 = urllib.request.urlopen（链接）
            soup4 = BeautifulSoup（响应2html.parser）
            companylink = soup4.find_all（一，HREF = re.compile（R\\ expCompany \\的.aspx \\？corpid = [0-9] +））
            companylink =（[链接[HREF]在companylink链接]）
            companylink = [字符串+ X在companylink X]
            在companylink链接：
                response3 = urllib.request.urlopen（链接）
                soup5 = BeautifulSoup（response3html.parser）
                companydetail = soup5.find_all（格，ID =联系）
                在companydetail元素：
                    公司名称= element.a [0] .get_text（）
                    打印（公司名称）
                    公司地址= element.a [1] .get_text（）
                    打印（公司地址），而且我得到错误

和我收到错误

 回溯（最后最近一次调用）：
  文件D：\\ python的\\ phase3.py，第54行，上述＆lt;＆模块GT;
    LIS = companydetail.find_all（礼）
AttributeError的：结果对象有没有属性'find_all

解决方案

companydetail 是的ResultSet 。也就是说，它是一个包含很多元素的迭代对象（如列表或设置）。错误发生，因为你试图调用 .find_all（）在此的ResultSet 对象。你应该通过这个对象，这样并调用来迭代 find_all（）上的ResultSet 的元素：

 在companydetail D：
    LIS = d.find_all（礼）

或者得到 companydetail 所有里 s的列表中使用列表COM prehension：

  LIS = [李在companydetail d.find_all（礼）为D]。

I am scraping a website to get the company and product details. It has the div tag in which there is li tag and I want to get all the li tag within the div tag. I am using python 3.5.1 and BeautifulSoup

My code:

from bs4 import BeautifulSoup
import urllib.request
import re
r = urllib.request.urlopen('http://i.cantonfair.org.cn/en/ExpExhibitorList.aspx?k=glassware')
soup = BeautifulSoup(r, "html.parser")

links = soup.find_all("a", href=re.compile(r"expexhibitorlist\.aspx\?categoryno=[0-9]+"))
linksfromcategories = ([link["href"] for link in links])

string = "http://i.cantonfair.org.cn/en/"
linksfromcategories = [string + x for x in linksfromcategories]

for link in linksfromcategories:
    response = urllib.request.urlopen(link)
    soup2 = BeautifulSoup(response, "html.parser")
    links2 = soup2.find_all("a", href=re.compile(r"\ExpExhibitorList\.aspx\?categoryno=[0-9]+"))
    linksfromsubcategories = ([link["href"] for link in links2])
    linksfromsubcategories = [string + x for x in linksfromsubcategories]
    for link in linksfromsubcategories:
        response = urllib.request.urlopen(link)
        soup3 = BeautifulSoup(response, "html.parser")
        links3 = soup3.find_all("a", href=re.compile(r"\ExpExhibitorList\.aspx\?categoryno=[0-9]+"))
        linksfromsubcategories2 = ([link["href"] for link in links3])
        linksfromsubcategories2 = [string + x for x in linksfromsubcategories2]
        for link in linksfromsubcategories2:
            response2 = urllib.request.urlopen(link)
            soup4 = BeautifulSoup(response2, "html.parser")
            companylink = soup4.find_all("a", href=re.compile(r"\expCompany\.aspx\?corpid=[0-9]+"))
            companylink = ([link["href"] for link in companylink])
            companylink = [string + x for x in companylink]
            for link in companylink:
                response3 = urllib.request.urlopen(link)
                soup5 = BeautifulSoup(response3, "html.parser")
                companydetail = soup5.find_all("div", id="contact")
                for element in companydetail:
                    companyname = element.a[0].get_text()
                    print (companyname)
                    companyaddress = element.a[1].get_text()
                    print (companyaddress)And I am getting error

And I am getting error

Traceback (most recent call last):
  File "D:\python\phase3.py", line 54, in <module>
    lis = companydetail.find_all('li')
AttributeError: 'ResultSet' object has no attribute 'find_all'

解决方案

companydetail is a ResultSet. That is to say, it's an iterable object that contains many elements (like a list or a set). The error is occurring because you try to call .find_all() on this ResultSet object. You should be iterating through this object like this and calling find_all() on the elements in the ResultSet:

for d in companydetail:
    lis = d.find_all('li')

Or to get a list of all lis in companydetail using list comprehension:

lis = [ li for d.find_all('li') for d in companydetail ]

这篇关于如何获得div标签内的所有标签里的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何获得div标签内的所有标签里 [英] How to get all the li tag within div tag

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何获得div标签内的所有标签里 [英] How to get all the li tag within div tag

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭