取从网络文本与角JS标记,如NG-视图 [英] fetch text from web with Angular JS tags such as ng-view

查看:160
本文介绍了取从网络文本与角JS标记,如NG-视图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图来从一个网站的所有可见的文字,我使用python-scrapy这项工作。但是我观察scrapy只能用HTML标签,如DIV,身体,头部等,而不是与角的js标记,如NG-视图的工作,如果有NG视​​图标签中的任何元素,当我做了右键单击该页面并做查看源代码,然后标签里面的内容不会出现,它会显示如< NG-视图> < / NG-视图> ,那么,如何可以使用Python报废此NG-观点tags.Thanks中的元素提前..

I'm trying to fetch all the visible text from a website, I'm using python-scrapy for this work. However what i observe scrapy only works with HTML tags such as div,body,head etc. and not with angular js tags such as ng-view, if there is any element within ng-view tags and when I do a right-click on the page and do view source then the content inside the tag doesn't appear and it displays like <ng-view> </ng-view>, So how can I use python to scrap the elements within this ng-view tags.Thanks in advance..

推荐答案

要回答你的问题。

我如何使用Python放弃这个NG视图标签中的元素

how can I use python to scrap the elements within this ng-view tags

您不能。

要刮的内容呈现在客户端(浏览器),什么scrapy得到的,你是从服务器只是静态内容,您的浏览器比国米$ P $点的HTML code和呈现JS code。和JS code比再次获取来自服务器的不同内容,并提出了一些东西吧。

The content you want to scrape renders on the client side(browser), what scrapy get's you is just static content from server, your browser than interprets the HTML code and renders the JS code. And JS code than fetches different content from server again and makes some stuff with it.

能不能做到?

是的!

方法之一是使用某种哦无头的浏览器,如 http://phantomjs.org/ 获取的所有内容。一旦你的内容,你可以将它保存和刮如你所愿。问题是,这种网络拼抢是不容易和简单的像刚才刮普通的HTML。还有一个原因,谷歌仍然不刮的网页渲染通过JS其内容。

One of the ways is to use some sort oh headless browser like http://phantomjs.org/ to fetch all the content. Once you have the content you can save it and scrape it as you wish. The thing is that this kind of web scraping is not as easy and straight forward as just scraping regular HTML. There is a reason why Google still doesn't scrape web pages that render their content via JS.

这篇关于取从网络文本与角JS标记,如NG-视图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆