我如何解析一些我想使用bs4的字符串? [英] how can i parsing some string which i want use bs4?

查看:51
本文介绍了我如何解析一些我想使用bs4的字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是html体系结构

<a href="/main/list.nhn?mode=LS2D&amp;mid=shm&amp;sid1=100&amp;sid2=264" class="snb_s11 nclicks(lmn_pol.mnl,,1)">BlueHouse <span class="blind">selected</span></a>

然后下面是我的仅获得Blue House的源代码

and then below is my source code to get only Blue House

 middle_category = soup.find('a',{'class':'snb_s11 nclicks(lmn_pol.mnl,,1)'})

当我运行该代码以仅获取Blue House时,它给了我选择结果.

贝洛是我的完整代码

    def crwaling_data_bluehouse(self):
        # setting web driver to get object
        chrome_driver = webdriver.Chrome('D:/바탕 화면/인턴/python/crawling_software/crwaler/news_crwaling/chromedriver.exe')
        url = 'https://news.naver.com/main/list.nhn?mode=LS2D&mid=shm&sid1=100&sid2=264'
        chrome_driver.get(url)
        html = chrome_driver.page_source
        soup = BeautifulSoup(html, 'html.parser')
        
        #get main category
        main_category = soup.find('a',{'class':'nclicks(LNB.pol)'}).find('span',{'class':'tx'}).get_text()
        self.set_main_category(main_category)
        
        #get middle category
        middle_category = soup.find('a',{'class':'snb_s11 nclicks(lmn_pol.mnl,,1)'}).get_text()
        middle_category = middle_category.find_next(text = True)
        self.set_middle_category(middle_category)
        
        #get title
        title = soup.find('ul',{'class':'type06_headline'}).find('a')['href']
        self.set_title(title)

推荐答案

您可以使用

You can use find_next() which will only return the first match:

from bs4 import BeautifulSoup

txt = """<a href="/main/list.nhn?mode=LS2D&amp;mid=shm&amp;sid1=100&amp;sid2=264" class="snb_s11 nclicks(lmn_pol.mnl,,1)">BlueHouse <span class="blind">selected</span></a>"""
soup = BeautifulSoup(txt, 'html.parser')

middle_category = soup.find('a', {'class': 'snb_s11 nclicks(lmn_pol.mnl,,1)'})
print(middle_category.find_next(text=True))

输出:

BlueHouse 

编辑不要调用 get_text().代替

middle_category = soup.find('a',{'class':'snb_s11 nclicks(lmn_pol.mnl ,, 1)'}).get_text()

使用 middle_category = soup.find('a',{'class':'snb_s11 nclicks(lmn_pol.mnl ,, 1)'}).find_next(text = True)

这篇关于我如何解析一些我想使用bs4的字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆