页面地址中的UTF-8编码,搜索引擎搜寻器存在问题 [英] UTF-8 encoding in page addresses, issues with search engine crawlers

查看:96
本文介绍了页面地址中的UTF-8编码,搜索引擎搜寻器存在问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在维护一个在某些页面地址中使用字母æøå的网站.到目前为止,除一些早期的IE问题外,此方法都运行良好.最近两周我们遇到的问题是搜索引擎爬虫,尤其是必应(Bing),似乎一遍又一遍地对字母进行编码.

We are maintaining a website that uses the letters æ, ø, and å in some of the page addresses. And this has worked just fine, except for some IE-issues early on, up until now. The problem we have gotten this last couple of weeks is that search engine crawlers, especially Bing, seem to be encoding the letters over and over.

因此,当搜寻器尝试访问地址/butikk/m%C3%83%C6%92%C3%86%E2%80%99%C3%83%E2%80%A0%C3%A2%E2%82%AC%E2%84%A2%C3%83%C6%92%C3%A2%E2%82%AC%C5%A1%C3%83%E2%80%9A%C3%82%C2%A3%C3%83%C6%92%C3%86%E2%80%99%C3%83%C2%A2%C3%A2%E2%80%9A%C2%AC%C3%82%C2%A0%C3%83%C6%92%C3%82%C2%A2%C3%83%C2%A2%C3%A2%E2%82%AC%C5%A1%C3%82%C2%AC%C3%83%C2%A2%C3%A2%E2%82%AC%C5%BE%C3%82%C2%A2%C3%83%C6%92%C3%86%E2%80%99%C3%83%E2%80%A0%C3%A2%E2%82%AC%E2%84%A2%C3%83%C6%92%C3%A2%E2%82%AC%C5%A1%C3%83%E2%80%9A%C3%82%C2%A2%C3%83%C6%92%C3%86%E2%80%99%C3%83%C2%A2%C3%A2%E2%80%9A%C2%AC%C3%85%C2%A1%C3%83%C6%92%C3%A2%E2%82%AC%C5%A1%C3%83%E2%80%9A%C3%82%C2%B8bler而不是/butikk/møbler时,我们会收到404错误.使用/butikk/m%c3%b8bler也会使您进入正确的页面.在使用Play Framework时,由于控制器的字符数不能超过250个字符,因此也会出现网站错误,但这并不是真正的问题.

So we get 404-errors as the crawler is trying to access the address /butikk/m%C3%83%C6%92%C3%86%E2%80%99%C3%83%E2%80%A0%C3%A2%E2%82%AC%E2%84%A2%C3%83%C6%92%C3%A2%E2%82%AC%C5%A1%C3%83%E2%80%9A%C3%82%C2%A3%C3%83%C6%92%C3%86%E2%80%99%C3%83%C2%A2%C3%A2%E2%80%9A%C2%AC%C3%82%C2%A0%C3%83%C6%92%C3%82%C2%A2%C3%83%C2%A2%C3%A2%E2%82%AC%C5%A1%C3%82%C2%AC%C3%83%C2%A2%C3%A2%E2%82%AC%C5%BE%C3%82%C2%A2%C3%83%C6%92%C3%86%E2%80%99%C3%83%E2%80%A0%C3%A2%E2%82%AC%E2%84%A2%C3%83%C6%92%C3%A2%E2%82%AC%C5%A1%C3%83%E2%80%9A%C3%82%C2%A2%C3%83%C6%92%C3%86%E2%80%99%C3%83%C2%A2%C3%A2%E2%80%9A%C2%AC%C3%85%C2%A1%C3%83%C6%92%C3%A2%E2%82%AC%C5%A1%C3%83%E2%80%9A%C3%82%C2%B8bler, instead of /butikk/møbler. Using /butikk/m%c3%b8bler would also have gotten you to the right page. And as we are using Play Framework, we also get a site error as our controllers can be no longer than 250 characters, but that is not the real issue here.

最初,该网站上没有站点地图.我们添加了一个带有UTF-8编码地址的地址,希望它能以正确的方式引导漫游器,但到目前为止还没有.

Initially, there was no sitemap on the site. We added one, with UTF-8 encoded addresses, hoping this would lead the bots the right way, but so far nothing.

那么有人有人有类似的问题并解决了吗,或者在我们可以采取哪些措施使Bing Bot使用正确的地址方面有一些建议?任何帮助将不胜感激.

So has anybody had some similar issue and solved it, or have some suggestions in what we can do to make Bing Bot use the right addresses? Any help would be appreciated.

添加的信息: 查看Bing网站站长工具后,我发现Bing都为正确的地址建立了索引,并且使用的是ø"而不是ø"版本.因此,希望通过从索引中删除错误的地址来解决我的问题.

Added info: Having a look at Bing Webmaster Tools, I can see that Bing have both indexed the right address, and a version with "ø" instead of "ø". So my issue can hopefully be solved by removing the faulty address from the index.

推荐答案

最好的建议是在文件名/链接/地址中排除特殊字符.几年前,我遇到了类似的问题,其中包含ä,ö,ü的链接,可以通过简单地删除特殊字符并将其替换为标准UTF-8字符来解决.

The best suggestion would be to leave out special characters out of your filenames/links/adresses. I've had a similar issue a few years back with links containing ä, ö, ü, which was resolved by simple removing the special characters and replacing them with standard UTF-8 characters.

这篇关于页面地址中的UTF-8编码,搜索引擎搜寻器存在问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆