变换标题分成虚线URL友好字符串 [英] Transform title into dashed URL-friendly string

查看:133
本文介绍了变换标题分成虚线URL友好字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想编写一个将改变任何标题为URL友好的字符串,类似于计算器做什么一个C#方法:




  • 破折号

  • 替换空格去掉括号




我想按照RFC 3986标准除去保留的字符(从维基百科),但我不知道如果这样做够吗?它将使链接可行的,但没有人知道其它字符被替换这里在什么计算器?我不希望在我的网址与-s%,结束了...



当前实施



 字符串结果= Regex.Replace(value.Trim(),@[*'!'();?:@&功放; + = $,/ \\%# \ [\]<>«»{} _]); 
返回Regex.Replace(result.Trim(),@[\s * [\ --- \s] \s *], - );



我的提问




  1. 我应该去掉哪些字符?

  2. 我应该限制结果字符串的最大长度是多少?

  3. 任何人都知道哪些规则是在游戏应用在这里对SO?



的子问题结果
我应该搬到这个问题?即使它的节目相关的元


解决方案

而不是找东西来代替,的 无保留的字符是如此短暂,它会让一个不错的明确的正则表达式。

 返回Regex.Replace(值,@[^ A-ZA-z0-9_\〜。] +, - ); 



(请注意,我并​​没有包括在允许的字符列表中的破折号,这就是因此获得由1个或多个运营商大赚[ + ],使多个破折号(原或产生的或组合)被折叠,按照多米尼克罗杰的良好的出发点。 )



您可能还需要删除常用词(下称,一,一等),这样做虽然可以稍微改变的意义一句话。可能要删除任何尾随破折号和句点为好。



此外强烈建议你做什么,所以和别人做的,包括一个唯一的标识符的其他比标题,然后只处理URL时使用的唯一ID。因此, http://example.com/articles/1234567/is-the-pop-catholic (注意没有'E')和 HTTP: //example.com/articles/1234567/is-the-pope-catholic 解析为同一资源。


I would like to write a C# method that would transform any title into a URL friendly string, similar to what stackoverflow does:

  • replace spaces with dashes
  • remove parenthesis
  • etc.

I'm thinking of removing Reserved characters as per RFC 3986 standard (from Wikipedia) but I don't know if that would be enough? It would make links workable, but does anyone know what other characters are being replaced here at stackoverflow? I don't want to end up with %-s in my URLs...

Current implementation

string result = Regex.Replace(value.Trim(), @"[!*'""`();:@&+=$,/\\?%#\[\]<>«»{}_]");
return Regex.Replace(result.Trim(), @"[\s*[\-–—\s]\s*]", "-");

My questions

  1. Which characters should I remove?
  2. Should I limit the maximum length of resulting string?
  3. Anyone know which rules are applied on titles here on SO?

A sub-question
Should I move this question to meta even though it's programming related?

解决方案

Rather than looking for things to replace, the list of unreserved chars is so short, it'll make for a nice clear regex.

return Regex.Replace(value, @"[^A-Za-z0-9_\.~]+", "-");

(Note that I didn't include the dash in the list of allowed chars; that's so it gets gobbled up by the "1 or more" operator [+] so that multiple dashes (in the original or generated or a combination) are collapsed, as per Dominic Rodger's excellent point.)

You may also want to remove common words ("the", "an", "a", etc.), although doing so can slightly change the meaning of a sentence. Probably want to remove any trailing dashes and periods as well.

Also strongly recommend you do what SO and others do, and include a unique identifier other than the title, and then only use that unique ID when processing the URL. So http://example.com/articles/1234567/is-the-pop-catholic (note the missing 'e') and http://example.com/articles/1234567/is-the-pope-catholic resolve to the same resource.

这篇关于变换标题分成虚线URL友好字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆