使用Python搜索网页子页面中单词的频率 [英] Search the frequency of words in the sub pages of a webpage using Python

查看:47
本文介绍了使用Python搜索网页子页面中单词的频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在寻求如何抓取网页中的每个链接(页面或子页面)以及查找任何单词出现频率的方法,因此寻求帮助.我用了漂亮的汤刮,但我不认为我做得对.例如:我需要进入立即服务官方页面>解决方案>查看所有解决方案.并在查看所有解决方案"下的所有链接/子页面中找到智能"的频率.任何帮助将不胜感激.谢谢:)

I seek help as I am stuck on how to crawl each and every link (pages or sub pages) in a webpage and find the frequency of any word. I used beautiful soup for scraping but I don't think so I am doing it right. For ex: I need to go to Service now official page > Solutions > View all Solutions. And find the frequency of "Intelligent" in all the links/sub pages under View all Solutions. Any help would be very much appreciated. Thank you :)

我的代码

import requests
from bs4 import BeautifulSoup

url = "https://www.servicenow.com/solutions-by-category.html"
serviceNow_r = requests.get(url)
sNow_soup = BeautifulSoup(serviceNow_r.text, 'html.parser')

print(sNow_soup.find_all('href',{'class':'cta-list component'}))


for name in sNow_soup.find_all('href',{'class':'cta-list component'}):
    print(name.text)

推荐答案

这是访问页面中每个链接的href属性所需要的.

This is what you need to access the href attribute for every link in the page.

import requests
from bs4 import BeautifulSoup

url = "https://www.servicenow.com/solutions-by-category.html"
serviceNow_r = requests.get(url)
sNow_soup = BeautifulSoup(serviceNow_r.text, 'html.parser')

for anchor in sNow_soup.find_all('a', href=True):
    print(anchor['href'])

这篇关于使用Python搜索网页子页面中单词的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆