使用 Angular JS 标签(例如 ng-view)从网络中获取文本 [英] fetch text from web with Angular JS tags such as ng-view

查看:34
本文介绍了使用 Angular JS 标签(例如 ng-view)从网络中获取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从网站获取所有可见文本,我正在使用 python-scrapy 来完成这项工作.然而,我观察到的scrapy 只适用于HTML 标签,如div、body、head 等,而不适用于角js 标签如ng-view,如果ng-view 标签中有任何元素,当我右键单击时页面并查看源代码,然后标签内的内容不会出现,它显示为 <ng-view></ng-view>,那么我如何使用 python 来抓取这个 ng-view 标签中的元素.提前致谢..

I'm trying to fetch all the visible text from a website, I'm using python-scrapy for this work. However what i observe scrapy only works with HTML tags such as div,body,head etc. and not with angular js tags such as ng-view, if there is any element within ng-view tags and when I do a right-click on the page and do view source then the content inside the tag doesn't appear and it displays like <ng-view> </ng-view>, So how can I use python to scrap the elements within this ng-view tags.Thanks in advance..

推荐答案

回答你的问题

我如何使用 python 来删除这个 ng-view 标签中的元素

how can I use python to scrap the elements within this ng-view tags

你不能.

您要抓取的内容在客户端(浏览器)呈现,scrapy 得到的只是来自服务器的静态内容,您的浏览器会解释 HTML 代码并呈现 JS 代码.然后 JS 代码再次从服务器获取不同的内容并用它制作一些东西.

The content you want to scrape renders on the client side(browser), what scrapy get's you is just static content from server, your browser than interprets the HTML code and renders the JS code. And JS code than fetches different content from server again and makes some stuff with it.

能做到吗?

是的!

其中一种方法是使用诸如 http://phantomjs.org/ 之类的无头浏览器来获取所有的内容.获得内容后,您可以保存它并根据需要对其进行刮取.问题是这种网页抓取并不像抓取常规 HTML 那样简单直接.Google 仍然不抓取通过 JS 呈现其内容的网页是有原因的.

One of the ways is to use some sort oh headless browser like http://phantomjs.org/ to fetch all the content. Once you have the content you can save it and scrape it as you wish. The thing is that this kind of web scraping is not as easy and straight forward as just scraping regular HTML. There is a reason why Google still doesn't scrape web pages that render their content via JS.

这篇关于使用 Angular JS 标签(例如 ng-view)从网络中获取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆