浏览器中的HTML与python中的抓取数据不对应 [英] HTML in browser doesn't correspond to scraped data in python
问题描述
对于一个项目,我必须从其他网站上抓取数据,但是我遇到了一个问题.
For a project I've to scrap datas from a different website, and I'm having problem with one.
当我查看源代码时,我想要的东西在一个表中,因此似乎很容易删除.但是,当我运行脚本时,部分代码源不会显示.
When I look at the source code the things I want are in a table, so it seems to be easy to scrap. But when I run my script that part of the code source doesn't show.
这是我的代码.我尝试了不同的事情.最初没有任何标题,然后我添加了一些但没有区别.
Here is my code. I tried different things. At first there wasn't any headers, then I added some but no difference.
# import libraries
import urllib2
from bs4 import BeautifulSoup
import csv
import requests
# specify the url
quote_page = 'http://www.airpl.org/Pollens/pollinariums-sentinelles'
# query the website and return the html to the variable 'page'
response = requests.get(quote_page)
response.addheaders = [('User-agent', 'Mozilla/5.0')]
print(response.text)
# parse the html using beautiful soap and store in variable `response`
soup = BeautifulSoup(response.text, 'html.parser')
with open('allergene.txt', 'w') as f:
f.write(soup.encode('UTF-8', 'ignore'))
我要在网站上查找的是HTML格式为Herbacée"之后的内容:
What I'm looking for in the website is the things after "Herbacée" whose HTML Look like :
<p class="level1">
<img src="/static/img/state-0.png" alt="pas d'émission" class="state">
Herbacee
</p>
您知道什么地方出了问题吗?
Do you have any idea what's wrong ?
感谢您的帮助和新年快乐:)
Thanks for your help and happy new year guys :)
推荐答案
此页面使用JavaScript呈现表,包含该表的实际页面为:
This page use JavaScript to render the table, the real page contains the table is:
http://www.alertepollens.org/gardens/garden/1/state/
您可以在Chrome开发工具中找到此网址>>>网络.
You can find this url in Chrome Dev tools>>>Network.
这篇关于浏览器中的HTML与python中的抓取数据不对应的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!