变换标题分成虚线URL友好字符串 [英] Transform title into dashed URL-friendly string
问题描述
我想编写一个将改变任何标题为URL友好的字符串,类似于计算器做什么一个C#方法:
- 破折号
- 替换空格去掉括号
- 等
我想按照RFC 3986标准除去保留的字符(从维基百科),但我不知道如果这样做够吗?它将使链接可行的,但没有人知道其它字符被替换这里在什么计算器?我不希望在我的网址与-s%,结束了...
当前实施
字符串结果= Regex.Replace(value.Trim(),@[*'!'();?:@&功放; + = $,/ \\%# \ [\]<>«»{} _]);
返回Regex.Replace(result.Trim(),@[\s * [\ --- \s] \s *], - );
我的提问
- 我应该去掉哪些字符?
- 我应该限制结果字符串的最大长度是多少?
- 任何人都知道哪些规则是在游戏应用在这里对SO?
的子问题结果
我应该搬到这个问题?即使它的节目相关的元
而不是找东西来代替,的 无保留的字符是如此短暂,它会让一个不错的明确的正则表达式。
返回Regex.Replace(值,@[^ A-ZA-z0-9_\〜。] +, - );
(请注意,我并没有包括在允许的字符列表中的破折号,这就是因此获得由1个或多个运营商大赚[ +
],使多个破折号(原或产生的或组合)被折叠,按照多米尼克罗杰的良好的出发点。 )
您可能还需要删除常用词(下称,一,一等),这样做虽然可以稍微改变的意义一句话。可能要删除任何尾随破折号和句点为好。
此外强烈建议你做什么,所以和别人做的,包括一个唯一的标识符的其他比标题,然后只处理URL时使用的唯一ID。因此, http://example.com/articles/1234567/is-the-pop-catholic
(注意没有'E')和 HTTP: //example.com/articles/1234567/is-the-pope-catholic
解析为同一资源。
I would like to write a C# method that would transform any title into a URL friendly string, similar to what stackoverflow does:
- replace spaces with dashes
- remove parenthesis
- etc.
I'm thinking of removing Reserved characters as per RFC 3986 standard (from Wikipedia) but I don't know if that would be enough? It would make links workable, but does anyone know what other characters are being replaced here at stackoverflow? I don't want to end up with %-s in my URLs...
Current implementation
string result = Regex.Replace(value.Trim(), @"[!*'""`();:@&+=$,/\\?%#\[\]<>«»{}_]");
return Regex.Replace(result.Trim(), @"[\s*[\-–—\s]\s*]", "-");
My questions
- Which characters should I remove?
- Should I limit the maximum length of resulting string?
- Anyone know which rules are applied on titles here on SO?
A sub-question
Should I move this question to meta even though it's programming related?
Rather than looking for things to replace, the list of unreserved chars is so short, it'll make for a nice clear regex.
return Regex.Replace(value, @"[^A-Za-z0-9_\.~]+", "-");
(Note that I didn't include the dash in the list of allowed chars; that's so it gets gobbled up by the "1 or more" operator [+
] so that multiple dashes (in the original or generated or a combination) are collapsed, as per Dominic Rodger's excellent point.)
You may also want to remove common words ("the", "an", "a", etc.), although doing so can slightly change the meaning of a sentence. Probably want to remove any trailing dashes and periods as well.
Also strongly recommend you do what SO and others do, and include a unique identifier other than the title, and then only use that unique ID when processing the URL. So http://example.com/articles/1234567/is-the-pop-catholic
(note the missing 'e') and http://example.com/articles/1234567/is-the-pope-catholic
resolve to the same resource.
这篇关于变换标题分成虚线URL友好字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!