使用机械化和美丽的汤，在HTML中使用原始HTML与DOM进行刮刮 [英] Raw HTML vs. DOM scraping in python using mechanize and beautiful soup

查看：123 发布时间：2017/6/25 3:34:47 python dom screen-scraping mechanize

本文介绍了使用机械化和美丽的汤，在HTML中使用原始HTML与DOM进行刮刮的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试编写一个程序，例如，将刮掉这个网页的最高价格：

I am attempting to write a program that, as an example, will scrape the top price off of this web page:

http://www.kayak.com/#/flights/JFK -PAR / 2012-06-01 / 2012-07-01 / 1adults

首先，我很容易通过执行以下操作来检索HTML：

First, I am easily able to retrieve the HTML by doing the following:

from urllib import urlopen 
from BeautifulSoup import BeautifulSoup
import mechanize

webpage = 'http://www.kayak.com/#/flights/JFK-PAR/2012-06-01/2012-07-01/1adults'
br = mechanize.Browser()
data = br.open(webpage).get_data()

soup = BeautifulSoup(data)
print soup

但是，原始HTML不包含价格。浏览器是...这是事情（澄清这里也可能帮助我）...并从其他地方检索价格，而构建DOM树。

However, the raw HTML does not contain the price. The browser does...it's thing (clarification here might help me also)...and retrieves the price from elsewhere while it constructs the DOM tree.

我被领导相信机械化将像我的浏览器一样运行，并返回DOM树，我也相信我会看到，例如，Chrome的开发者工具视图（如果我对此不正确），我该如何去获取任何价格信息？）有没有什么我需要告诉机械化去做DOM树？

I was led to believe that mechanize would act just like my browser and return the DOM tree, which I am also led to believe is what I see when I look at, for example, Chrome's Developer Tools view of the page (if I'm incorrect about this, how do I go about getting whatever that price information is stored in?) Is there something that I need to tell mechanize to do in order to see the DOM tree?

一旦我可以将DOM树变成python，我需要做的其他事情应该是一个很好的选择。谢谢！

Once I can get the DOM tree into python, everything else I need to do should be a snap. Thanks!

使用机械化和美丽的汤，在HTML中使用原始HTML与DOM进行刮刮 [英] Raw HTML vs. DOM scraping in python using mechanize and beautiful soup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用机械化和美丽的汤，在HTML中使用原始HTML与DOM进行刮刮 [英] Raw HTML vs. DOM scraping in python using mechanize and beautiful soup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭