重写“漂亮的 URL"时如何处理变音符号(口音) [英] How to handle diacritics (accents) when rewriting 'pretty URLs'

查看:23
本文介绍了重写“漂亮的 URL"时如何处理变音符号(口音)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我重写了 URL 以包含用户生成的旅行博客的标题.

我这样做是为了 URL 的可读性和 SEO 目的.

<前>http://www.example.com/gallery/280-Gorges_du_Todra/

第一个整数是id,其余的是我们人类(但与请求资源无关).

现在人们可以编写包含任何 UTF-8 字符的标题,但大多数都不允许出现在 URL 中.我的听众通常会说英语,但由于他们经常旅行,所以他们喜欢包含诸如

之类的名字<前>阿伊特·本·哈杜

在 Linux 上使用 PHP 将其转换为显示在 URL 中的正确方法是什么.

到目前为止,我已经看到了几种解决方案:

  1. 去掉所有不允许的字符,替换空格这有奇怪的结果:
    'Aït Ben Haddou' →/gallery/280-At_Ben_Haddou/
    不是很有帮助.

  2. 很可能是因为使用了'regex-hammer',所以删除所有不允许的字符,替换空格,保留字符代码(stackoverflow.com)
    这给出了奇怪的结果:'测试测试' →/questions/0000/t233st-t233st

  3. 翻译成最近的等价物"
    'Aït Ben Haddou' →/gallery/280-Ait_Ben_Haddou/
    但是这对德语来说是错误的;例如ü"应音译为ue".

对我来说,作为一个荷兰人,第三个结果看起来"最好.
然而,我很确定 (1) 许多人会有不同的意见,以及 (2) 在德国的例子中这完全是错误的.

第三个选项的另一个问题是:如何找到可以转换为 7 位等价物的所有可能字符?

所以问题是:

  1. 在您看来,最理想的结果是什么.(在技术范围内)

  2. 如何从技术上解决它.(达到预期的结果)使用 PHP.

解决方案

最终,对于这个问题,您将不得不放弃正确"的想法.翻译字符串,不管你怎么做,都会以兼容性和可读性的名义破坏准确性.所有三个选项都同样兼容,但 #1 和 #2 在可读性方面受到影响.所以只需运行它并选择看起来最好的 - 选项#3.

是的,德语的翻译是错误的,但是除非您开始要求您的用户指定他们的标题所用的语言(并将它们限制为仅一种),否则您将无法在不付出更多努力的情况下解决该问题这是值得的.(例如,通过每种已知语言的词典运行标题中的每个单词并根据其语言规则翻译该单词的变音符号会工作,但它是过度的.)

或者,如果德语比其他语言更受关注,请让您的翻译始终在存在德语版本时使用:äaeëeïiöoe, üue.

哦,至于实际方法,我会通过str_replace翻译特殊情况,如果有的话,然后使用iconv来处理其余的:

$text = str_replace(array("ä", "ö", "ü", "ß"), array("ae", "oe", "ue", "ss"), $文本);$text = iconv('UTF-8', 'US-ASCII//TRANSLIT', $text);

I rewrite URLs to include the title of user generated travelblogs.

I do this for both readability of URLs and SEO purposes.

 http://www.example.com/gallery/280-Gorges_du_Todra/

The first integer is the id, the rest is for us humans (but is irrelevant for requesting the resource).

Now people can write titles containing any UTF-8 character, but most are not allowed in the URL. My audience is generally English speaking, but since they travel, they like to include names like

 Aït Ben Haddou

What is the proper way to translate this for displaying in an URL using PHP on linux.

So far I've seen several solutions:

  1. just strip all non allowed characters, replace spaces this has strange results:
    'Aït Ben Haddou' → /gallery/280-At_Ben_Haddou/
    Not really helpfull.

  2. just strip all non allowed characters, replace spaces, leave charcode (stackoverflow.com) most likely because of the 'regex-hammer' used
    this gives strange results: 'tést tést' → /questions/0000/t233st-t233st

  3. translate to 'nearest equivalent'
    'Aït Ben Haddou' → /gallery/280-Ait_Ben_Haddou/
    But this goes wrong for german; for example 'ü' should be transliterated 'ue'.

For me, as a Dutch person, the 3rd result 'looks' the best.
I'm quite sure however that (1) many people will have a different opinion and (2) it is just plain wrong in the german example.

Another problem with the 3rd option is: how to find all possible characters that can be converted to a 7bit equivalent?

So the question is:

  1. what, in your opinion, is the most desirable result. (within tech-limits)

  2. How to technically solve it. (reach the desired result) with PHP.

解决方案

Ultimately, you're going to have to give up on the idea of "correct", for this problem. Translating the string, no matter how you do it, destroys accuracy in the name of compatibility and readability. All three options are equally compatible, but #1 and #2 suffer in terms of readability. So just run with it and go for whatever looks best — option #3.

Yes, the translations are wrong for German, but unless you start requiring your users to specify what language their titles are in (and restricting them to only one), you're not going to solve that problem without far more effort than it's worth. (For example, running each word in the title through dictionaries for each known language and translating that word's diacritics according to the rules of its language would work, but it's excessive.)

Alternatively, if German is a higher concern than other languages, make your translation always use the German version when one exists: äae, ëe, ïi, öoe, üue.

Edit:

Oh, and as for the actual method, I'd translate the special cases, if any, via str_replace, then use iconv for the rest:

$text = str_replace(array("ä", "ö", "ü", "ß"), array("ae", "oe", "ue", "ss"), $text);
$text = iconv('UTF-8', 'US-ASCII//TRANSLIT', $text);

这篇关于重写“漂亮的 URL"时如何处理变音符号(口音)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆