制作角度抓取 - 在开始项目 [英] Making angular crawlable - Beginning of Project

查看:116
本文介绍了制作角度抓取 - 在开始项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在开发中angularJS一个网站你有你开始你的现场工作之前担心网络爬虫,或者你可以把它关闭,直到该网站完成。

我已阅读,HTML快照是一个很好的解决方案,例如。如果你选择这样做,你就可以编码一个网站后,来实现它,或者你就可以围绕这种功能为基础,以创建网站。


解决方案

我认为这是很好的去思考战略在项目的开始和执行它接近项目结束。

我们得到了在我在工作的公司的问题。

在所有的情况下,你需要回答的GET请求到端点像

  ...?_ escaped_fragment _ = /家庭

时,说谷歌或Bing,将抓取的页面

  ...#/家

请参阅官方有关详细信息,谷歌文档

现在的问题是,你将如何弥补资源的内容

<?pre> ... _ escaped_fragment _ =:路径

有型动物的策略:

履带请求的资源每次生成动态快照与PhantomJS

这包括在产卵在运行时PhantomJS进程,重定向生成的HTML页面的输出的内容并把它发回的履带。

我觉得这是最横向和透明的解决方案,如果你的网站有大量的动态抓取的内容。

生成在构建时与静态PhantomJS快照或击球时保存网站的CMS的按钮

这是很好的,如果你抓取内容的内容不会改变,或只是不时。

生成静态«相当于»内容在开发时文件或碰撞时保存网站的CMS的按钮

这是因为它不涉及PhantomJS一个非常便宜的解决方案。这是一件好事,如果内容简单,如果你可以随便写,或从数据库中生成它。

这是很难处理,如果内容是复杂的检索,你将需要复制你的code(一个客户端渲染角度的看法,和一个服务器端生成整个页面«相当于»内容爬虫)

我提到的PhantomJS的解决方案,但无论无头(或者没有,如果你能买得起显示器)浏览器会做这项工作。你甚至可以想像能够使你的观点服务器端没有任何浏览器,但只运行你的JS在服务器的NodeJS例如


亦思之始,如果你将使用HTML5风格的URL,或哈希,或hashbang的URL。这可能是难以改变,一旦内容被搜索引​​擎索引。即使它可以被看作是«丑陋»我建议hashbang风格。*

When developing a site in angularJS do you have to worry about web crawlers before you start working on your site, or can you push it off until the site is finished.

I have read that HTML snapshots are a good solution, for instance. If you chose to do this, would you be able to implement it after coding a site, or would you have to create the site based around this kind of functionality.

解决方案

I think it's good to think about the strategy at the beginning of the project and implement it close to the end of the project.

We got the problem in the company I am working at.

In all cases you will need to answer GET requests to endpoints like

...?_escaped_fragment_=/home

when, say Google or Bing, will crawl the page

...#/home

See offical Google documentation for details.

The question is how you will fill the content of the resource

...?_escaped_fragment_=:path

There are differents strategies :

Generate dynamic snapshots with PhantomJS every time a crawler asks for the resource

This consists in spawning a PhantomJS process at runtime, redirecting the content of the generated HTML page to the output and sending it back to the crawler.

I think this is the most transverse and transparent solution if you website has a lot of dynamic crawlable content.

Generate static snapshots with PhantomJS at build time or when hitting the save button of the CMS of the website

This is good if the content of your crawlable content never changes or just from time to time.

Generate static « equivalent » content files at dev time or when hitting the save button of the CMS of the website

This is a very cheap solution as it does not involve PhantomJS. This is good if the content is simple and if you can easily write it or generate it from a database.

It is difficult to handle if the content is complicated to retrieve as you will need to duplicate your code (one client side to render Angular views, and one serverside to generate the whole page « equivalent » content for crawlers).

I mentioned the PhantomJS solution, but whatever headless (or not if you can afford a display) browser will do the work. You can even imagine being able to render your views server-side without any browser but just running you JS in a NodeJS server for instance.


Also think for the beginning if you will use HTML5 style URLs, or hash, or hashbang URLs. This can be difficult to change once the content is indexed by search engines. I advice hashbang style even if it can be seen as « ugly ».*

这篇关于制作角度抓取 - 在开始项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆