有效地替换字符串中的所有重音字符? [英] Efficiently replace all accented characters in a string?

查看:179
本文介绍了有效地替换字符串中的所有重音字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于一个穷人在客户端执行 near -collat​​ion-correct排序,我需要一个能够在字符串中高效单个字符替换的JavaScript函数。

For a poor man's implementation of near-collation-correct sorting on the client side I need a JavaScript function that does efficient single character replacement in a string.

这就是我的意思(请注意,这适用于德语文本,其他语言的排序方式不同):

Here is what I mean (note that this applies to German text, other languages sort differently):


native sorting gets it wrong: a b c o u z ä ö ü
collation-correct would be:   a ä b c o ö u ü z

基本上,我需要将所有出现的给定字符串的ä替换为a(依此类推)。这样,本机排序的结果将非常接近用户期望的结果(或数据库将返回的内容)。

Basically, I need all occurrences of "ä" of a given string replaced with "a" (and so on). This way the result of native sorting would be very close to what a user would expect (or what a database would return).

其他语言可以做到这一点: Python提供 str.translate() ,在 Perl有 tr / ... / ... / XPath有一个函数 translate() ColdFusion有 ReplaceList() 。但是JavaScript呢?

Other languages have facilities to do just that: Python supplies str.translate(), in Perl there is tr/…/…/, XPath has a function translate(), ColdFusion has ReplaceList(). But what about JavaScript?

这就是我现在所拥有的。

Here is what I have right now.

// s would be a rather short string (something like 
// 200 characters at max, most of the time much less)
function makeSortString(s) {
  var translate = {
    "ä": "a", "ö": "o", "ü": "u",
    "Ä": "A", "Ö": "O", "Ü": "U"   // probably more to come
  };
  var translate_re = /[öäüÖÄÜ]/g;
  return ( s.replace(translate_re, function(match) { 
    return translate[match]; 
  }) );
}

对于初学者,我不喜欢每次重建正则表达式的事实我打电话给这个功能。我认为关闭可以在这方面提供帮助,但我似乎并没有因为某种原因而接受它。

For starters, I don't like the fact that the regex is rebuilt every time I call the function. I guess a closure can help in this regard, but I don't seem to get the hang of it for some reason.

有人能想到更高效的东西吗? / p>




以下答案分为两类:



Can someone think of something more efficient?


  1. 不同完整性和效率的字符串替换功能(我最初询问的内容)

  2. A 提及 String#localeCompare ,JS引擎广泛支持,可以更优雅地解决这类问题。

  1. String replacement functions of varying degrees of completeness and efficiency (what I was originally asking about)
  2. A late mention of String#localeCompare, which is widely supported among JS engines and could solve this category of problem much more elegantly.


推荐答案

我不能说你正在尝试用函数本身做什么,但如果你不喜欢正则表达式每次建立,这里有两个解决方案和一些警告每个。

I can't speak to what you are trying to do specifically with the function itself, but if you don't like the regex being built every time, here are two solutions and some caveats about each.

这是一种方法:

function makeSortString(s) {
  if(!makeSortString.translate_re) makeSortString.translate_re = /[öäüÖÄÜ]/g;
  var translate = {
    "ä": "a", "ö": "o", "ü": "u",
    "Ä": "A", "Ö": "O", "Ü": "U"   // probably more to come
  };
  return ( s.replace(makeSortString.translate_re, function(match) { 
    return translate[match]; 
  }) );
}

这显然会使正则表达式成为函数本身的属性。你唯一不喜欢这个(或者你可能,我猜它取决于)是正则表达式现在可以在函数体外修改。因此,有人可以这样做来修改通常使用的正则表达式:

This will obviously make the regex a property of the function itself. The only thing you may not like about this (or you may, I guess it depends) is that the regex can now be modified outside of the function's body. So, someone could do this to modify the interally-used regex:

makeSortString.translate_re = /[a-z]/g;

所以,有这个选项。

获得闭包的一种方法是防止有人修改正则表达式,将其定义为匿名函数赋值如下:

One way to get a closure, and thus prevent someone from modifying the regex, would be to define this as an anonymous function assignment like this:

var makeSortString = (function() {
  var translate_re = /[öäüÖÄÜ]/g;
  return function(s) {
    var translate = {
      "ä": "a", "ö": "o", "ü": "u",
      "Ä": "A", "Ö": "O", "Ü": "U"   // probably more to come
    };
    return ( s.replace(translate_re, function(match) { 
      return translate[match]; 
    }) );
  }
})();

希望这对你有用。

更新:现在还早,我不知道为什么我之前没有看到明显的情况,但是把你翻译也很有用对象:

UPDATE: It's early and I don't know why I didn't see the obvious before, but it might also be useful to put you translate object in a closure as well:

var makeSortString = (function() {
  var translate_re = /[öäüÖÄÜ]/g;
  var translate = {
    "ä": "a", "ö": "o", "ü": "u",
    "Ä": "A", "Ö": "O", "Ü": "U"   // probably more to come
  };
  return function(s) {
    return ( s.replace(translate_re, function(match) { 
      return translate[match]; 
    }) );
  }
})();

这篇关于有效地替换字符串中的所有重音字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆