浏览器中的HTML与python中的抓取数据不对应 [英] HTML in browser doesn't correspond to scraped data in python

查看：99 发布时间：2020/9/20 8:04:09 python html web-scraping beautifulsoup

本文介绍了浏览器中的HTML与python中的抓取数据不对应的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对于一个项目，我必须从其他网站上抓取数据，但是我遇到了一个问题.

For a project I've to scrap datas from a different website, and I'm having problem with one.

当我查看源代码时，我想要的东西在一个表中，因此似乎很容易删除.但是，当我运行脚本时，部分代码源不会显示.

When I look at the source code the things I want are in a table, so it seems to be easy to scrap. But when I run my script that part of the code source doesn't show.

这是我的代码.我尝试了不同的事情.最初没有任何标题，然后我添加了一些但没有区别.

Here is my code. I tried different things. At first there wasn't any headers, then I added some but no difference.

# import libraries
import urllib2
from bs4 import BeautifulSoup
import csv  
import requests

# specify the url 
quote_page = 'http://www.airpl.org/Pollens/pollinariums-sentinelles'

# query the website and return the html to the variable 'page'
response = requests.get(quote_page)  
response.addheaders = [('User-agent', 'Mozilla/5.0')]
print(response.text)

# parse the html using beautiful soap and store in variable `response`
soup = BeautifulSoup(response.text, 'html.parser')  

with open('allergene.txt', 'w') as f:
    f.write(soup.encode('UTF-8', 'ignore'))

我要在网站上查找的是HTML格式为Herbacée"之后的内容:

What I'm looking for in the website is the things after "Herbacée" whose HTML Look like :

<p class="level1">

      <img src="/static/img/state-0.png" alt="pas d'émission" class="state">

    Herbacee
  </p>

您知道什么地方出了问题吗?

Do you have any idea what's wrong ?

感谢您的帮助和新年快乐:)

Thanks for your help and happy new year guys :)

浏览器中的HTML与python中的抓取数据不对应 [英] HTML in browser doesn't correspond to scraped data in python

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

浏览器中的HTML与python中的抓取数据不对应 [英] HTML in browser doesn&#39;t correspond to scraped data in python

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

浏览器中的HTML与python中的抓取数据不对应 [英] HTML in browser doesn't correspond to scraped data in python

登录关闭