清理带有标题的 URL 的最佳方法是什么 [英] What is the Best Way to Clean a URL with a Title in it

查看:31
本文介绍了清理带有标题的 URL 的最佳方法是什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

清理 URL 的最佳方法是什么?我正在寻找这样的网址

What is the best way to clean a URL? I am looking for a URL like this

what_is_the_best_headache_medication

what_is_the_best_headache_medication

我当前的代码

public string CleanURL(string str)
{
    str = str.Replace("!", "");
    str = str.Replace("@", "");
    str = str.Replace("#", "");
    str = str.Replace("$", "");
    str = str.Replace("%", "");
    str = str.Replace("^", "");
    str = str.Replace("&", "");
    str = str.Replace("*", "");
    str = str.Replace("(", "");
    str = str.Replace(")", "");
    str = str.Replace("-", "");
    str = str.Replace("_", "");
    str = str.Replace("+", "");
    str = str.Replace("=", "");
    str = str.Replace("{", "");
    str = str.Replace("[", "");
    str = str.Replace("]", "");
    str = str.Replace("}", "");
    str = str.Replace("|", "");
    str = str.Replace(@"\", "");
    str = str.Replace(":", "");
    str = str.Replace(";", "");
    str = str.Replace(@"\", "");
    str = str.Replace("'", "");
    str = str.Replace("<", "");
    str = str.Replace(">", "");
    str = str.Replace(",", "");
    str = str.Replace(".", "");
    str = str.Replace("`", "");
    str = str.Replace("~", "");
    str = str.Replace("/", "");
    str = str.Replace("?", "");
    str = str.Replace("  ", " ");
    str = str.Replace("   ", " ");
    str = str.Replace("    ", " ");
    str = str.Replace("     ", " ");
    str = str.Replace("      ", " ");
    str = str.Replace("       ", " ");
    str = str.Replace("        ", " ");
    str = str.Replace("         ", " ");
    str = str.Replace("          ", " ");
    str = str.Replace("           ", " ");
    str = str.Replace("            ", " ");
    str = str.Replace("             ", " ");
    str = str.Replace("              ", " ");
    str = str.Replace(" ", "_");
    return str;
}

推荐答案

通常你最好的选择是使用白名单正则表达式方法,而不是删除所有不需要的字符,因为你肯定会去想念一些.

Generally your best bet is to go with a white list regular expression approach instead of removing all the unwanted characters because you definitely are going to miss some.

到目前为止,这里的答案都很好,但我个人不想完全删除带有重音符号的变音符号和字符.所以我想出的最终解决方案是这样的:

The answers here are fine so far but I personally did not want to remove umlauts and characters with accent marks entirely. So the final solution I came up with looks like this:

public static string CleanUrl(string value)
{
    if (value.IsNullOrEmpty())
        return value;

    // replace hyphens to spaces, remove all leading and trailing whitespace
    value = value.Replace("-", " ").Trim().ToLower();

    // replace multiple whitespace to one hyphen
    value = Regex.Replace(value, @"[\s]+", "-");

    // replace umlauts and eszett with their equivalent
    value = value.Replace("ß", "ss");
    value = value.Replace("ä", "ae");
    value = value.Replace("ö", "oe");
    value = value.Replace("ü", "ue");

    // removes diacritic marks (often called accent marks) from characters
    value = RemoveDiacritics(value);

    // remove all left unwanted chars (white list)
    value = Regex.Replace(value, @"[^a-z0-9\s-]", String.Empty);

    return value;
}

使用的 RemoveDiacritics 方法基于 布莱尔康拉德的回答:

The used RemoveDiacritics method is based on the SO answer by Blair Conrad:

public static string RemoveDiacritics(string value)
{
    if (value.IsNullOrEmpty())
        return value;

    string normalized = value.Normalize(NormalizationForm.FormD);
    StringBuilder sb = new StringBuilder();

    foreach (char c in normalized)
    {
        if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
            sb.Append(c);
    }

    Encoding nonunicode = Encoding.GetEncoding(850);
    Encoding unicode = Encoding.Unicode;

    byte[] nonunicodeBytes = Encoding.Convert(unicode, nonunicode, unicode.GetBytes(sb.ToString()));
    char[] nonunicodeChars = new char[nonunicode.GetCharCount(nonunicodeBytes, 0, nonunicodeBytes.Length)];
    nonunicode.GetChars(nonunicodeBytes, 0, nonunicodeBytes.Length, nonunicodeChars, 0);

    return new string(nonunicodeChars);
}

希望能帮助那些通过 slugifying URLs 和保持变音和朋友同时使用他们的 URL 友好的等价物来挑战的人.

Hope that helps somebody challenged by slugifying URLs and keeping umlauts and friends with their URL friendly equivalent at the same time.

这篇关于清理带有标题的 URL 的最佳方法是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆