清理带有标题的 URL 的最佳方法是什么 [英] What is the Best Way to Clean a URL with a Title in it
问题描述
清理 URL 的最佳方法是什么?我正在寻找这样的网址
What is the best way to clean a URL? I am looking for a URL like this
what_is_the_best_headache_medication
what_is_the_best_headache_medication
我当前的代码
public string CleanURL(string str)
{
str = str.Replace("!", "");
str = str.Replace("@", "");
str = str.Replace("#", "");
str = str.Replace("$", "");
str = str.Replace("%", "");
str = str.Replace("^", "");
str = str.Replace("&", "");
str = str.Replace("*", "");
str = str.Replace("(", "");
str = str.Replace(")", "");
str = str.Replace("-", "");
str = str.Replace("_", "");
str = str.Replace("+", "");
str = str.Replace("=", "");
str = str.Replace("{", "");
str = str.Replace("[", "");
str = str.Replace("]", "");
str = str.Replace("}", "");
str = str.Replace("|", "");
str = str.Replace(@"\", "");
str = str.Replace(":", "");
str = str.Replace(";", "");
str = str.Replace(@"\", "");
str = str.Replace("'", "");
str = str.Replace("<", "");
str = str.Replace(">", "");
str = str.Replace(",", "");
str = str.Replace(".", "");
str = str.Replace("`", "");
str = str.Replace("~", "");
str = str.Replace("/", "");
str = str.Replace("?", "");
str = str.Replace(" ", " ");
str = str.Replace(" ", " ");
str = str.Replace(" ", " ");
str = str.Replace(" ", " ");
str = str.Replace(" ", " ");
str = str.Replace(" ", " ");
str = str.Replace(" ", " ");
str = str.Replace(" ", " ");
str = str.Replace(" ", " ");
str = str.Replace(" ", " ");
str = str.Replace(" ", " ");
str = str.Replace(" ", " ");
str = str.Replace(" ", " ");
str = str.Replace(" ", "_");
return str;
}
推荐答案
通常你最好的选择是使用白名单正则表达式方法,而不是删除所有不需要的字符,因为你肯定会去想念一些.
Generally your best bet is to go with a white list regular expression approach instead of removing all the unwanted characters because you definitely are going to miss some.
到目前为止,这里的答案都很好,但我个人不想完全删除带有重音符号的变音符号和字符.所以我想出的最终解决方案是这样的:
The answers here are fine so far but I personally did not want to remove umlauts and characters with accent marks entirely. So the final solution I came up with looks like this:
public static string CleanUrl(string value)
{
if (value.IsNullOrEmpty())
return value;
// replace hyphens to spaces, remove all leading and trailing whitespace
value = value.Replace("-", " ").Trim().ToLower();
// replace multiple whitespace to one hyphen
value = Regex.Replace(value, @"[\s]+", "-");
// replace umlauts and eszett with their equivalent
value = value.Replace("ß", "ss");
value = value.Replace("ä", "ae");
value = value.Replace("ö", "oe");
value = value.Replace("ü", "ue");
// removes diacritic marks (often called accent marks) from characters
value = RemoveDiacritics(value);
// remove all left unwanted chars (white list)
value = Regex.Replace(value, @"[^a-z0-9\s-]", String.Empty);
return value;
}
使用的 RemoveDiacritics
方法基于 布莱尔康拉德的回答:
The used RemoveDiacritics
method is based on the SO answer by Blair Conrad:
public static string RemoveDiacritics(string value)
{
if (value.IsNullOrEmpty())
return value;
string normalized = value.Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();
foreach (char c in normalized)
{
if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
sb.Append(c);
}
Encoding nonunicode = Encoding.GetEncoding(850);
Encoding unicode = Encoding.Unicode;
byte[] nonunicodeBytes = Encoding.Convert(unicode, nonunicode, unicode.GetBytes(sb.ToString()));
char[] nonunicodeChars = new char[nonunicode.GetCharCount(nonunicodeBytes, 0, nonunicodeBytes.Length)];
nonunicode.GetChars(nonunicodeBytes, 0, nonunicodeBytes.Length, nonunicodeChars, 0);
return new string(nonunicodeChars);
}
希望能帮助那些通过 slugifying URLs 和保持变音和朋友同时使用他们的 URL 友好的等价物来挑战的人.
Hope that helps somebody challenged by slugifying URLs and keeping umlauts and friends with their URL friendly equivalent at the same time.
这篇关于清理带有标题的 URL 的最佳方法是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!