robots.txt:如何禁止动态文件夹的子文件夹 [英] robots.txt : how to disallow subfolders of dynamic folder

查看:70
本文介绍了robots.txt:如何禁止动态文件夹的子文件夹的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样的网址:/products/:product_id/deals/new/products/:product_id/deals/index

I have urls like these: /products/:product_id/deals/new /products/:product_id/deals/index

我想禁止我的 robots.txt 文件中的deals"文件夹.

I'd like to disallow the "deals" folder in my robots.txt file.

我想禁止 Google、Yahoo 和 Bing Bots 使用此文件夹.有谁知道这些机器人是否支持通配符,所以会支持以下规则?

I'd like to disallow this folder for Google, Yahoo and Bing Bots. Does anyone know if these bots support wildcard character and so would support the following rule?

Disallow: /products/*/deals

另外...你有关于robots.txt规则的好教程吗?由于我没有设法找到一个真正"好的,我可以使用一个......

Also... Do you have any really good tuto on robots.txt rules? As I didn't manage to find a "really" good one I could use one...

最后一个问题:robots.txt 是处理这个问题的最佳方式吗?还是我应该更好地使用noindex"元?

And one last question: Is the robots.txt the best way to handle this? Or should I better use the "noindex" meta?

谢谢大家!:)

推荐答案

是的,所有主要搜索引擎都支持基本通配符*,您的解决方案将阻止您的交易 目录.

Yes, all the major search engines support basic wildcards *, and your solution will work to disallow your deals directory.

了解 robots.txt 的最佳地点实际上是 Google Developer页.它提供了大量示例,说明哪些有效,哪些无效.例如,很多人不知道 robots.txt 文件是特定于协议的.因此,如果您想在 https 连接上阻止页面,您需要确保您在 https://yoursite.com/robots.txt

The best place to learn about robots.txt is really the Google Developer page. It provides plenty of examples of what works and what doesn't. For instance, many people don't know that robots.txt files are protocol specific. So if you want to block pages on a https connection, you'll need to make sure you have a robots.txt at https://yoursite.com/robots.txt

您还可以在通过 Google 网站管理员工具应用之前测试新的 robots.txt 文件.基本上,您可以在部署之前通过搜索引擎验证它是否真的有效.

You can also test a new robots.txt file before applying it through Google Webmaster Tools. Basically you can verify with the search engine whether or not it'll actually work before you deploy it.

关于使用 robots.txt 阻止某些内容或仅向页面添加 noindex,我更倾向于在大多数情况下使用 noindex,除非我知道我不希望搜索引擎抓取我网站的该部分完全没有.

With regards to blocking something with robots.txt or just adding a noindex to the pages, I'm more inclined to use the noindex in most scenarios unless I know I don't want the search engines crawling that section of my site at all.

有一些权衡.当您完全阻止搜索引擎时,您可以节省一些抓取预算".因此,搜索引擎会抓取其他页面,然后在您不希望他们访问的页面上浪费"时间.但是,这些网址仍然可以出现在搜索结果中.

There are some trade offs. When you block the search engine altogether, you can save on some of your "crawl budget". So the search engines will crawl other pages then "waste" their time on pages you don't want them to visit. However, those URLs can still appear in the search results.

如果您绝对不希望这些页面有任何搜索引荐流量,最好使用 noindex 指令.此外,如果您经常链接到交易页面,noindex 不仅会将其从搜索结果中删除,而且任何链接值/PageRank 都可以流经这些页面并可以相应地进行计算.如果你阻止它们被抓取,那就有点像黑洞了.

If you absolutely don't want any search referral traffic to those pages, it's better to use the noindex directive. Additionally, if you link to the deals page often, a noindex not only removes it from the search results, but any link value / PageRank can flow through those pages and can be calculated accordingly. If you block them from being crawled, it's sort of a blackhole.

这篇关于robots.txt:如何禁止动态文件夹的子文件夹的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆