页面地址中的UTF-8编码,搜索enginge搜寻器的问题 [英] UTF-8 encoding in page addresses, issues with search enginge crawlers

查看:168
本文介绍了页面地址中的UTF-8编码,搜索enginge搜寻器的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在维护一个使用字母æøå的网站在一些页面地址中。除了一些早期的IE问题之外,这一切都很顺利,直到现在。我们最近几周得到的问题是,搜索引擎抓取工具,特别是Bing,似乎一遍又一遍地对这些字母进行编码。

因此,当爬虫尝试访问地址 / butikk / m%C3%83%C6%92%C3 %86%E2%80%99%C3%83%E2%80%A0%C3%A2%E2%82%AC%E2%84%A2%C3%83%C 6%92%C3%A2%E2%82 %AC%C5%A1%C3%83%E2%80%9A%C3%82%C2%A3%C3%83%C 6%92%C3%86%E2%80%99%C3%83%C2%A2 %C3%A2%E2%80%9A%C2%AC%C3%82%C2%A0%C3%83%C 6%92%C3%82%C2%A2%C3%83%C2%A2%C3%A2 %E2%82%AC%C5%A1%C3%82%C2%AC%C3%83%C2%A2%C3%A2%E2%82%AC%C5%BE%C3%82%C2%A2%C3 %83%C6%92%C3%86%E2%80%99%C3%83%E2%80%A0%C3%A2%E2%82%AC%E2%84%A2%C3%83%C 6%92 %C3%A2%E2%82%AC%C5%A1%C3%83%E2%80%9A%C3%82%C2%A2%C3%83%C 6%92%C3%86%E2%80%99 %C3%83%C2%A2%C3%A2%E2%80%9A%C2%AC%C3%85%C2%A1%C3%83%C 6%92%C3%A2%E2%82%AC%C5 %A1%C3%83%E2%80%9A%C3%82%C2%B8bler ,而不是 / butikk /møbler。使用 / butikk / m%c3%b8bler 也会将您带到正确的页面。当我们使用Play Framework时,我们也会出现网站错误,因为我们的控制器不能超过250个字符,但这不是真正的问题。



最初,该网站上没有网站地图。我们添加了一个UTF-8编码地址,希望这会让机器人以正确的方式运行,但目前为止还没有。

所以有人遇到类似的问题并解决它或者我们可以做些什么来使Bing Bot使用正确的地址有一些建议?任何帮助,将不胜感激。



增加信息:看看Bing网站管理员工具,我可以看到Bing有两个索引了正确的地址,以及带有ø而不是ø的版本。所以我的问题可以通过从索引中删除错误的地址来解决。

解决方案

最好的建议是放弃特殊字符出自你的文件名/链接/地址。几年前,我收到了一个类似的问题,其中包含ä,ö,ü,通过简单地删除特殊字符并用标准的UTF-8字符替换它们来解决。


We are maintaining a website that uses the letters æ, ø, and å in some of the page addresses. And this has worked just fine, except for some IE-issues early on, up until now. The problem we have gotten this last couple of weeks is that search engine crawlers, especially Bing, seem to be encoding the letters over and over.

So we get 404-errors as the crawler is trying to access the address /butikk/m%C3%83%C6%92%C3%86%E2%80%99%C3%83%E2%80%A0%C3%A2%E2%82%AC%E2%84%A2%C3%83%C6%92%C3%A2%E2%82%AC%C5%A1%C3%83%E2%80%9A%C3%82%C2%A3%C3%83%C6%92%C3%86%E2%80%99%C3%83%C2%A2%C3%A2%E2%80%9A%C2%AC%C3%82%C2%A0%C3%83%C6%92%C3%82%C2%A2%C3%83%C2%A2%C3%A2%E2%82%AC%C5%A1%C3%82%C2%AC%C3%83%C2%A2%C3%A2%E2%82%AC%C5%BE%C3%82%C2%A2%C3%83%C6%92%C3%86%E2%80%99%C3%83%E2%80%A0%C3%A2%E2%82%AC%E2%84%A2%C3%83%C6%92%C3%A2%E2%82%AC%C5%A1%C3%83%E2%80%9A%C3%82%C2%A2%C3%83%C6%92%C3%86%E2%80%99%C3%83%C2%A2%C3%A2%E2%80%9A%C2%AC%C3%85%C2%A1%C3%83%C6%92%C3%A2%E2%82%AC%C5%A1%C3%83%E2%80%9A%C3%82%C2%B8bler, instead of /butikk/møbler. Using /butikk/m%c3%b8bler would also have gotten you to the right page. And as we are using Play Framework, we also get a site error as our controllers can be no longer than 250 characters, but that is not the real issue here.

Initially, there was no sitemap on the site. We added one, with UTF-8 encoded addresses, hoping this would lead the bots the right way, but so far nothing.

So has anybody had some similar issue and solved it, or have some suggestions in what we can do to make Bing Bot use the right addresses? Any help would be appreciated.

Added info: Having a look at Bing Webmaster Tools, I can see that Bing have both indexed the right address, and a version with "ø" instead of "ø". So my issue can hopefully be solved by removing the faulty address from the index.

解决方案

The best suggestion would be to leave out special characters out of your filenames/links/adresses. I've had a similar issue a few years back with links containing ä, ö, ü, which was resolved by simple removing the special characters and replacing them with standard UTF-8 characters.

这篇关于页面地址中的UTF-8编码,搜索enginge搜寻器的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆