使用Python Web抓取进行数据检索时遇到的问题 [英] Problems with data retrieving using Python web scraping
本文介绍了使用Python Web抓取进行数据检索时遇到的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我写了一个简单的代码从网页上抓取数据,但是我提到了带有标记的对象类之类的所有东西,但是我的程序没有抓取数据.还有一件事是,我也想抓取一封电子邮件,但不知道该如何提及其ID或类.您能否指导我-我该如何解决此问题?谢谢!
I wrote a simple code for scraping data from a web page but I mention all the thing like object class with tag but my program does not scrape data. One more thing there is an email that I also want to scrape but not know how to mention its id or class. Could you please guide me - how can I fix this issue? Thanks!
这是我的代码:
import requests
from bs4 import BeautifulSoup
import csv
def get_page(url):
response = requests.get(url)
if not response.ok:
print('server responded:', response.status_code)
else:
soup = BeautifulSoup(response.text, 'html.parser') # 1. html , 2. parser
return soup
def get_detail_data(soup):
try:
title = soup.find('hi',class_="page-header",id=False).text
except:
title = 'empty'
print(title)
try:
email = soup.find('',class_="",id=False).text
except:
email = 'empty'
print(email)
def main():
url = "https://www.igrc.org/clergydetail/2747164"
#get_page(url)
get_detail_data(get_page(url))
if __name__ == '__main__':
main()
推荐答案
请注意,电子邮件的值不是纯文本格式.通过JS在 script标记
中