如何使用"for soup.find_all"中的匹配方式组合几个类似的命令? [英] How to combine several similar commands using "for match in soup.find_all"?

查看:153
本文介绍了如何使用"for soup.find_all"中的匹配方式组合几个类似的命令?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有下面的代码,其中包含与for match in soup.find_all类似的命令.我想问一下是否有可能合并它们,从而获得更简洁的代码.

I have below code in which there are similar commands involved for match in soup.find_all. I would like to ask if it's possible to merge them and thus have cleaner code.

import requests
from bs4 import BeautifulSoup

url = 'https://www.collinsdictionary.com/dictionary/french-english/aimanter'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
soup = BeautifulSoup(requests.get(url, headers = headers).content, 'html.parser')

entry_name = soup.h2.text

for script in soup.select('script, .hcdcrt, #ad_contentslot_1, #ad_contentslot_2'):
    script.extract()

for match in soup.find_all('div', {'class' : 'copyright'}):  
    match.extract()
    
for match in soup.find_all('div', {'class' : 'example-info'}):  
    match.extract()

for match in soup.find_all('div', {'class' : 'share-overlay'}):  
    match.extract()
    
for match in soup.find_all('div', {'class' : 'popup-overlay'}):  
    match.extract()    
    

content1 = ''.join(map(str, soup.select_one('.cB.cB-def.dictionary.biling').contents))
content2 = ''.join(map(str, soup.select_one('.cB.cB-e.dcCorpEx').contents))

format = open('aimer.html', 'w+', encoding = 'utf8')
format.write(entry_name + '\n' + str(content1) + str(content2) + '\n</>\n' )
format.close()

推荐答案

您可以将带有.find_all()的各种循环组合到第一个.select()中.

You can combine the various loops with .find_all() into the first .select().

例如:

import requests
from bs4 import BeautifulSoup


url = 'https://www.collinsdictionary.com/dictionary/french-english/aimanter'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
soup = BeautifulSoup(requests.get(url, headers = headers).content, 'html.parser')

entry_name = soup.h2.text

for tag in soup.select('''
        script,
        .hcdcrt,
        #ad_contentslot_1,
        #ad_contentslot_2,
        div.copyright,
        div.example-info,
        div.share-overlay,
        div.popup-overlay'''):
    tag.extract()

content1 = ''.join(map(str, soup.select_one('.cB.cB-def.dictionary.biling').contents))
content2 = ''.join(map(str, soup.select_one('.cB.cB-e.dcCorpEx').contents))

format = open('aimer.html', 'w+', encoding = 'utf8')
format.write(entry_name + '\n' + str(content1) + str(content2) + '\n</>\n' )
format.close()

这篇关于如何使用"for soup.find_all"中的匹配方式组合几个类似的命令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆