如何确定html的这些元素? [英] How to determine these elements of html?
问题描述
In this answer, @Andrej Kesely use the following code to remove unnecessary elements (ads, huge space,...) from html of this url.
import requests
from bs4 import BeautifulSoup
url = 'https://www.collinsdictionary.com/dictionary/french-english/aimer'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
for script in soup.select('script, .hcdcrt, #ad_contentslot_1, #ad_contentslot_2'):
script.extract()
print(soup.h2.text)
print(''.join(map(str, soup.select_one('.hom').contents)))
在我看来,那些不必要的元素用script, .hcdcrt, #ad_contentslot_1, #ad_contentslot_2
标记.
It seems to me that those unnecessary elements are marked by script, .hcdcrt, #ad_contentslot_1, #ad_contentslot_2
.
您能否详细说明如何查看html结构(按F12键)来固定它们?
Could you please elaborate how to look at the html structure (by pressing F12) to pin down them?
推荐答案
@bigbounty的评论解决了我的问题.我将其发布在此处,以将我的问题从未答复的列表中删除.
@bigbounty's comment solves my problem. I post it here to remove my question from unanswered list.
一种方法是右键单击chrome,然后使用livedom.validator.nu或任何其他在线服务可视化html DOM
One way is Right Click on chrome and visualize the html DOM using livedom.validator.nu or any other online service
这篇关于如何确定html的这些元素?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!