如何确定html的这些元素? [英] How to determine these elements of html?

查看:64
本文介绍了如何确定html的这些元素?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在此答案中,@ Andrej Kesely使用以下代码从

In this answer, @Andrej Kesely use the following code to remove unnecessary elements (ads, huge space,...) from html of this url.

import requests
from bs4 import BeautifulSoup

url = 'https://www.collinsdictionary.com/dictionary/french-english/aimer'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')

for script in soup.select('script, .hcdcrt, #ad_contentslot_1, #ad_contentslot_2'):
    script.extract()

print(soup.h2.text)
print(''.join(map(str, soup.select_one('.hom').contents)))

在我看来,那些不必要的元素用script, .hcdcrt, #ad_contentslot_1, #ad_contentslot_2标记.

It seems to me that those unnecessary elements are marked by script, .hcdcrt, #ad_contentslot_1, #ad_contentslot_2.

您能否详细说明如何查看html结构(按F12键)来固定它们?

Could you please elaborate how to look at the html structure (by pressing F12) to pin down them?

推荐答案

@bigbounty的评论解决了我的问题.我将其发布在此处,以将我的问题从未答复的列表中删除.

@bigbounty's comment solves my problem. I post it here to remove my question from unanswered list.

一种方法是右键单击chrome,然后使用livedom.validator.nu或任何其他在线服务可视化html DOM

One way is Right Click on chrome and visualize the html DOM using livedom.validator.nu or any other online service

这篇关于如何确定html的这些元素?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆