使角度可爬行 - 项目开始 [英] Making angular crawlable - Beginning of Project

查看:21
本文介绍了使角度可爬行 - 项目开始的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 angularJS 中开发网站时,您是否需要在开始处理您的网站之前担心网络爬虫,或者您可以将其推迟到网站完成.

When developing a site in angularJS do you have to worry about web crawlers before you start working on your site, or can you push it off until the site is finished.

例如,我读到 HTML 快照是一个很好的解决方案.如果您选择这样做,您是否能够在对网站进行编码后实施它,或者您是否必须围绕此类功能创建网站.

I have read that HTML snapshots are a good solution, for instance. If you chose to do this, would you be able to implement it after coding a site, or would you have to create the site based around this kind of functionality.

推荐答案

我认为在项目开始时考虑策略并在接近项目结束时实施它是很好的.

I think it's good to think about the strategy at the beginning of the project and implement it close to the end of the project.

我们在我工作的公司遇到了问题.

We got the problem in the company I am working at.

在所有情况下,您都需要响应对端点的 GET 请求,例如

In all cases you will need to answer GET requests to endpoints like

...?_escaped_fragment_=/home

Google 或 Bing 何时会抓取页面

when, say Google or Bing, will crawl the page

...#/home

有关详细信息,请参阅 Google 官方文档.

See offical Google documentation for details.

问题是你将如何填充资源的内容

The question is how you will fill the content of the resource

...?_escaped_fragment_=:path

有不同的策略:

每次爬虫请求资源时使用 PhantomJS 生成动态快照

这包括在运行时生成 PhantomJS 进程,将生成的 HTML 页面的内容重定向到输出并将其发送回爬虫.

This consists in spawning a PhantomJS process at runtime, redirecting the content of the generated HTML page to the output and sending it back to the crawler.

如果您的网站有大量可抓取的动态内容,我认为这是最横向和最透明的解决方案.

I think this is the most transverse and transparent solution if you website has a lot of dynamic crawlable content.

在构建时或点击网站 CMS 的保存按钮时使用 PhantomJS 生成静态快照

如果您的可抓取内容的内容从不改变或只是不时改变,这很好.

This is good if the content of your crawlable content never changes or just from time to time.

在开发时或点击网站 CMS 的保存按钮时生成静态等效"内容文件

这是一个非常便宜的解决方案,因为它不涉及 PhantomJS.如果内容简单,并且您可以轻松编写或从数据库生成它,那么这很好.

This is a very cheap solution as it does not involve PhantomJS. This is good if the content is simple and if you can easily write it or generate it from a database.

如果内容难以检索,则很难处理,因为您需要复制代码(一个客户端呈现 Angular 视图,一个服务器端为爬虫生成整个页面等效"内容).

It is difficult to handle if the content is complicated to retrieve as you will need to duplicate your code (one client side to render Angular views, and one serverside to generate the whole page « equivalent » content for crawlers).

我提到了 PhantomJS 解决方案,但是无论是无头浏览器(或者如果你能负担得起显示器)都可以完成这项工作.您甚至可以想象无需任何浏览器即可在服务器端呈现视图,而只需在 NodeJS 服务器中运行 JS.

如果您将使用 HTML5 样式的 URL、hash 或 hashbang URL,还要考虑一开始.一旦内容被搜索引擎索引,这可能很难改变.我建议 hashbang 风格,即使它可以被视为丑陋".*

Also think for the beginning if you will use HTML5 style URLs, or hash, or hashbang URLs. This can be difficult to change once the content is indexed by search engines. I advice hashbang style even if it can be seen as « ugly ».*

这篇关于使角度可爬行 - 项目开始的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆