允许SEO网址中使用非英语(ASCII)字符吗? [英] Allowing non-English (ASCII) characters in the URL for SEO?

查看:147
本文介绍了允许SEO网址中使用非英语(ASCII)字符吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多UTF-8内容,我想将它们插入URL中以进行SEO.例如,发布要包含在URI(site.com/tags/id/TAG-NAME)中的帖子标签.但是,标准只允许使用ASCII字符.

I have lots of UTF-8 content that I want inserted into the URL for SEO purposes. For example, post tags that I want to include in th URI (site.com/tags/id/TAG-NAME). However, only ASCII characters are allowed by the standards.

URI中允许的字符 但没有保留的目的是 称为未保留.这些包括 大写和小写字母, 十进制数字,连字符,句号, 下划线和波浪号.

Characters that are allowed in a URI but do not have a reserved purpose are called unreserved. These include uppercase and lowercase letters, decimal digits, hyphen, period, underscore, and tilde.

解决方案似乎是:

  • 将字符串转换为 使用UTF-8的字节序列 编码
  • 转换每个字节 不是%HH的ASCII字母或数字, 其中HH是的十六进制值 字节
  • Convert the character string into a sequence of bytes using the UTF-8 encoding
  • Convert each byte that is not an ASCII letter or digit to %HH, where HH is the hexadecimal value of the byte

但是,这会将易读(和SEO有价值)的单词转换为mumbo-jumbo .因此,我想知道Google是否仍然足够聪明,可以处理包含编码数据的URL中的搜索-还是我应该尝试将那些非英语字符转换为半ASCII对应字符(这可能对基于拉丁语的语言有所帮助)?

However, that converts the legible (and SEO valuable) words into mumbo-jumbo. So I'm wondering if google is still smart enough to handle searches in URL's that contain encoded data - or if I should attempt to convert those non-english characters into there semi-ASCII counterparts (which might help with latin based languages)?

推荐答案

首先,搜索引擎实际上并不在乎URL.它们可以帮助访问者:访问者链接到站点,搜索引擎对此表示关注. URL很容易成为垃圾邮件,如果他们关心的话,就会诱使他们成为垃圾邮件.没有主要的搜索引擎希望如此. allinurl:只是google的一项功能,可帮助高级用户,而不是自然排名中的重要因素.您使用较自然的URL所获得的任何好处都可能是次等搜索引擎为您的网站编制索引后PR的附带好处-而且有证据表明,随着Google的出现,这可能是负面的负公关也是如此.

Firstly, search engines really don't care about the URLs. They help visitors: visitors link to sites, and search engines care about that. URLs are easy to spam, if they cared there would be incentive to spam. No major search engines wants that. The allinurl: is merely a feature of google to help advanced users, not something that gets factored into organic rankings. Any benefits you get from using a more natural URL will probably come as a fringe benefit of the PR from an inferior search engine indexing your site -- and there is some evidence this can be negative with the advent of negative PR too.

来自 Google网站站长中心

这是否意味着我应该避免 完全重写动态网址?

Does that mean I should avoid rewriting dynamic URLs at all?

那是 我们的建议,除非您 重写仅限于删除 不必要的参数,否则您就是 非常努力地删除所有 可能导致问题的参数. 如果您将动态网址转换为 使它看起来像静态的 意识到我们可能无法 正确解释信息 所有情况.如果您想提供 静态等效于您的网站,您 可能要考虑转型 通过提供 真正的静态替换.一 例子是为生成文件 所有路径并使其易于访问 您网站上的某处.但是,如果 您正在使用URL重写(而是 而不是复制内容) 从 动态网站,可能会对您造成伤害 而不是好.随时为您服务 我们您的标准动态网址,我们 会自动找到参数 这是不必要的.

That's our recommendation, unless your rewrites are limited to removing unnecessary parameters, or you are very diligent in removing all parameters that could cause problems. If you transform your dynamic URL to make it look static you should be aware that we might not be able to interpret the information correctly in all cases. If you want to serve a static equivalent of your site, you might want to consider transforming the underlying content by serving a replacement which is truly static. One example would be to generate files for all the paths and make them accessible somewhere on your site. However, if you're using URL rewriting (rather than making a copy of the content) to produce static-looking URLs from a dynamic site, you could be doing harm rather than good. Feel free to serve us your standard dynamic URL and we will automatically find the parameters which are unnecessary.

我个人认为,获得更多点击并帮助用户脱颖而出并不重要.就Unicode而言,您还不了解它是如何工作的:请求到达了十六进制编码的unicode目标,但是如果渲染引擎希望将它们解码回具有视觉吸引力的东西,则必须知道如何处理. Google会正确呈现(也可以解码)unicode(编码)URL .

I personally don't believe it matters all that much short of getting a little more click through and helping users out. So far as Unicode, you don't understand how this works: the request goes to the hex-encoded unicode destination, but the rendering engine must know how to handle this if it wishes to decode them back to something visually appealing. Google will render (aka decode) unicode (encoded) URL's properly.

由于网络钓鱼攻击使用看起来像这样的表意文字,某些浏览器通过始终对主机名部分进行编码使此操作稍微复杂一些.相同.

我想给你看一个例子,这里是请求

I wanted to show you an example of this, here is request to http://hy.wikipedia.org/wiki/Գլխավոր_Էջ issued by wget:

Hypertext Transfer Protocol
    GET /wiki/%D4%B3%D5%AC%D5%AD%D5%A1%D5%BE%D5%B8%D6%80_%D4%B7%D5%BB HTTP/1.0\r\n
        [Expert Info (Chat/Sequence): GET /wiki/%D4%B3%D5%AC%D5%AD%D5%A1%D5%BE%D5%B8%D6%80_%D4%B7%D5%BB HTTP/1.0\r\n]
            [Message: GET /wiki/%D4%B3%D5%AC%D5%AD%D5%A1%D5%BE%D5%B8%D6%80_%D4%B7%D5%BB HTTP/1.0\r\n]
            [Severity level: Chat]
            [Group: Sequence]
        Request Method: GET
        Request URI: /wiki/%D4%B3%D5%AC%D5%AD%D5%A1%D5%BE%D5%B8%D6%80_%D4%B7%D5%BB
        Request Version: HTTP/1.0
    User-Agent: Wget/1.11.4\r\n
    Accept: */*\r\n
    Host: hy.wikipedia.org\r\n
    Connection: Keep-Alive\r\n
    \r\n

如您所见,与其他浏览器一样,wget只会对您的目标进行url编码,然后继续将请求发送到url编码的目标. 经url解码的域仅是为了视觉方便.

As you can see, wget like every other browser will just url-encode the destination for you, and the continue the request to the url-encoded destination. The url-decoded domain only exists as a visual convenience.

这篇关于允许SEO网址中使用非英语(ASCII)字符吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆