网络如何从这一行抓取数据.. 没有我找不到的 div 和类元素.我想从那一行中提取数据??如何 [英] How web scrape data from this line .. there is no div and no class element I can't find.i want to extract data from that line??how
本文介绍了网络如何从这一行抓取数据.. 没有我找不到的 div 和类元素.我想从那一行中提取数据??如何的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
==$01."我们生活的目的是快乐."——"<strong>达赖喇嘛</strong></P>
上面的表单标签有很多引号,我找不到定位元素
解决方案
导入请求从 bs4 导入 BeautifulSoup从 pprint 导入 pp定义主(网址):r = requests.get(url)汤 = BeautifulSoup(r.text, 'lxml')x = [x.get_text(strip=True, separator=" ") for x in soup.select('span[data-parade-type="promoarea"] .figure_block ~ p')]目标 = [i for i in x if i[0].isdigit()]pp(目标)main('https://parade.com/937586/parade/life-quotes/')
<块引用>
注意,如果您使用的是 Windows
机器,不要忘记包括 from_encoding= 等于您的系统使用的编码.
参考:https://www.crummy.com/software/BeautifulSoup/bs4/doc/#encodings
否则:
print("\n".join(goal))
<p> ==$0
"1."the purpose of our lives is
to be happy." - "
<strong>Dalai Lama</strong>
</P>
there is many quotes like above form tags and I can't find locating elements
解决方案
import requests
from bs4 import BeautifulSoup
from pprint import pp
def main(url):
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
x = [x.get_text(strip=True, separator=" ") for x in soup.select(
'span[data-parade-type="promoarea"] .figure_block ~ p')]
goal = [i for i in x if i[0].isdigit()]
pp(goal)
main('https://parade.com/937586/parade/life-quotes/')
Note, If you are using
Windows
machine, DO NOT forget to include from_encoding= equal to the encoding used by your sys.
Ref: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#encodings
Otherwise:
print("\n".join(goal))
这篇关于网络如何从这一行抓取数据.. 没有我找不到的 div 和类元素.我想从那一行中提取数据??如何的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文