如何获得div标签内的所有标签里 [英] How to get all the li tag within div tag
问题描述
我刮了一个网站,让公司和产品的详细信息。
它有div标签,其中有李标签,我想div标签内的所有LI标记。
我使用python 3.5.1和BeautifulSoup
我的code:
从BS4进口BeautifulSoup
进口urllib.request里
进口重
R = urllib.request.urlopen('http://i.cantonfair.org.cn/en/ExpExhibitorList.aspx?k=glassware')
汤= BeautifulSoup(Rhtml.parser)链接= soup.find_all(一,HREF = re.compile(Rexpexhibitorlist \\的.aspx \\?categoryno = [0-9] +))
linksfromcategories =([链接[HREF]的链接链接])字符串=http://i.cantonfair.org.cn/en/
linksfromcategories = [字符串+ X在linksfromcategories X]在linksfromcategories链接:
响应= urllib.request.urlopen(链接)
soup2 = BeautifulSoup(回应,html.parser)
links2 = soup2.find_all(一,HREF = re.compile(R\\ ExpExhibitorList \\的.aspx \\?categoryno = [0-9] +))
linksfromsubcategories =([链接[HREF]的链接links2])
linksfromsubcategories = [字符串+ X在linksfromsubcategories X]
在linksfromsubcategories链接:
响应= urllib.request.urlopen(链接)
soup3 = BeautifulSoup(回应,html.parser)
体3 = soup3.find_all(一,HREF = re.compile(R\\ ExpExhibitorList \\的.aspx \\?categoryno = [0-9] +))
linksfromsubcategories2 =([链接[HREF]在体3链接])
linksfromsubcategories2 = [字符串+ X在linksfromsubcategories2 X]
在linksfromsubcategories2链接:
响应2 = urllib.request.urlopen(链接)
soup4 = BeautifulSoup(响应2html.parser)
companylink = soup4.find_all(一,HREF = re.compile(R\\ expCompany \\的.aspx \\?corpid = [0-9] +))
companylink =([链接[HREF]在companylink链接])
companylink = [字符串+ X在companylink X]
在companylink链接:
response3 = urllib.request.urlopen(链接)
soup5 = BeautifulSoup(response3html.parser)
companydetail = soup5.find_all(格,ID =联系)
在companydetail元素:
公司名称= element.a [0] .get_text()
打印(公司名称)
公司地址= element.a [1] .get_text()
打印(公司地址),而且我得到错误
和我收到错误
回溯(最后最近一次调用):
文件D:\\ python的\\ phase3.py,第54行,上述<&模块GT;
LIS = companydetail.find_all(礼)
AttributeError的:结果对象有没有属性'find_all
companydetail
是的ResultSet
。也就是说,它是一个包含很多元素的迭代对象(如列表
或设置
)。错误发生,因为你试图调用 .find_all()
在此的ResultSet
对象。你应该通过这个对象,这样并调用来迭代 find_all()
上的ResultSet
的元素:
在companydetail D:
LIS = d.find_all(礼)
或者得到 companydetail
所有里
s的列表中使用列表COM prehension:
LIS = [李在companydetail d.find_all(礼)为D]。
I am scraping a website to get the company and product details. It has the div tag in which there is li tag and I want to get all the li tag within the div tag. I am using python 3.5.1 and BeautifulSoup
My code:
from bs4 import BeautifulSoup
import urllib.request
import re
r = urllib.request.urlopen('http://i.cantonfair.org.cn/en/ExpExhibitorList.aspx?k=glassware')
soup = BeautifulSoup(r, "html.parser")
links = soup.find_all("a", href=re.compile(r"expexhibitorlist\.aspx\?categoryno=[0-9]+"))
linksfromcategories = ([link["href"] for link in links])
string = "http://i.cantonfair.org.cn/en/"
linksfromcategories = [string + x for x in linksfromcategories]
for link in linksfromcategories:
response = urllib.request.urlopen(link)
soup2 = BeautifulSoup(response, "html.parser")
links2 = soup2.find_all("a", href=re.compile(r"\ExpExhibitorList\.aspx\?categoryno=[0-9]+"))
linksfromsubcategories = ([link["href"] for link in links2])
linksfromsubcategories = [string + x for x in linksfromsubcategories]
for link in linksfromsubcategories:
response = urllib.request.urlopen(link)
soup3 = BeautifulSoup(response, "html.parser")
links3 = soup3.find_all("a", href=re.compile(r"\ExpExhibitorList\.aspx\?categoryno=[0-9]+"))
linksfromsubcategories2 = ([link["href"] for link in links3])
linksfromsubcategories2 = [string + x for x in linksfromsubcategories2]
for link in linksfromsubcategories2:
response2 = urllib.request.urlopen(link)
soup4 = BeautifulSoup(response2, "html.parser")
companylink = soup4.find_all("a", href=re.compile(r"\expCompany\.aspx\?corpid=[0-9]+"))
companylink = ([link["href"] for link in companylink])
companylink = [string + x for x in companylink]
for link in companylink:
response3 = urllib.request.urlopen(link)
soup5 = BeautifulSoup(response3, "html.parser")
companydetail = soup5.find_all("div", id="contact")
for element in companydetail:
companyname = element.a[0].get_text()
print (companyname)
companyaddress = element.a[1].get_text()
print (companyaddress)And I am getting error
And I am getting error
Traceback (most recent call last):
File "D:\python\phase3.py", line 54, in <module>
lis = companydetail.find_all('li')
AttributeError: 'ResultSet' object has no attribute 'find_all'
companydetail
is a ResultSet
. That is to say, it's an iterable object that contains many elements (like a list
or a set
). The error is occurring because you try to call .find_all()
on this ResultSet
object. You should be iterating through this object like this and calling find_all()
on the elements in the ResultSet
:
for d in companydetail:
lis = d.find_all('li')
Or to get a list of all li
s in companydetail
using list comprehension:
lis = [ li for d.find_all('li') for d in companydetail ]
这篇关于如何获得div标签内的所有标签里的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!