使用Python搜索网页子页面中单词的频率 [英] Search the frequency of words in the sub pages of a webpage using Python
问题描述
我一直在寻求如何抓取网页中的每个链接(页面或子页面)以及查找任何单词出现频率的方法,因此寻求帮助.我用了漂亮的汤刮,但我不认为我做得对.例如:我需要进入立即服务官方页面>解决方案>查看所有解决方案.并在查看所有解决方案"下的所有链接/子页面中找到智能"的频率.任何帮助将不胜感激.谢谢:)
I seek help as I am stuck on how to crawl each and every link (pages or sub pages) in a webpage and find the frequency of any word. I used beautiful soup for scraping but I don't think so I am doing it right. For ex: I need to go to Service now official page > Solutions > View all Solutions. And find the frequency of "Intelligent" in all the links/sub pages under View all Solutions. Any help would be very much appreciated. Thank you :)
我的代码
import requests
from bs4 import BeautifulSoup
url = "https://www.servicenow.com/solutions-by-category.html"
serviceNow_r = requests.get(url)
sNow_soup = BeautifulSoup(serviceNow_r.text, 'html.parser')
print(sNow_soup.find_all('href',{'class':'cta-list component'}))
for name in sNow_soup.find_all('href',{'class':'cta-list component'}):
print(name.text)
推荐答案
这是访问页面中每个链接的href属性所需要的.
This is what you need to access the href attribute for every link in the page.
import requests
from bs4 import BeautifulSoup
url = "https://www.servicenow.com/solutions-by-category.html"
serviceNow_r = requests.get(url)
sNow_soup = BeautifulSoup(serviceNow_r.text, 'html.parser')
for anchor in sNow_soup.find_all('a', href=True):
print(anchor['href'])
这篇关于使用Python搜索网页子页面中单词的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!