Javascript RegExp + Word边界+ unicode字符 [英] Javascript RegExp + Word boundaries + unicode characters

查看：145 发布时间：2019/1/21 15:10:19 javascript regex unicode

本文介绍了Javascript RegExp + Word边界+ unicode字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在构建搜索，我将使用javascript自动完成功能。我来自芬兰（芬兰语）因此我必须处理一些特殊字符，例如ä，ö和å

I am building search and I am going to use javascript autocomplete with it. I am from Finland (finnish language) so I have to deal with some special characters like ä, ö and å

当用户在搜索输入字段中键入文本时我会尝试将文本与数据匹配。

When user types text in to the search input field I try to match the text to data.

如果用户输入例如ää，这是一个无法正常工作的简单示例。与äl相同的事情

Here is simple example that is not working correctly if user types for example "ää". Same thing with "äl"

var title = "this is simple string with finnish word tämä on ääkköstesti älkää ihmetelkö";
// Does not work
var searchterm = "äl";

// does not work
//var searchterm = "ää";

// Works
//var searchterm = "wi";

if ( new RegExp("\\b"+searchterm, "gi").test(title) ) {
    $("#result").html("Match: ("+searchterm+"): "+title);
} else {
    $("#result").html("nothing found with term: "+searchterm);   
}

http://jsfiddle.net/7TsxB/

那么我怎样才能让那些ä，ö和å字符与之合作javascript正则表达式？

So how can I get those ä,ö and å characters to work with javascript regex?

我想我应该使用unicode代码但是我应该怎么做？这些字符的代码是：
[\ u00C4，\ u00E4，\ u00C5，\ 0000E5，\ u00D6，\ u00F6]

I think I should use unicode codes but how should I do that? Codes for those characters are: [\u00C4,\u00E4,\u00C5,\u00E5,\u00D6,\u00F6]

=>äÄåÅöÖ

推荐答案

Regex似乎存在问题，字边界 \ b 将字符串的开头与正常256字节范围之外的起始字符进行匹配。

There appears to be a problem with Regex and the word boundary \b matching the beginning of a string with a starting character out of the normal 256 byte range.

而不是使用 \ b ，尝试使用（？：^ | \\\\）

Instead of using \b, try using (?:^|\\s)

var title = "this is simple string with finnish word tämä on ääkköstesti älkää ihmetelkö";
// Does not work
var searchterm = "äl";

// does not work
//var searchterm = "ää";

// Works
//var searchterm = "wi";

if ( new RegExp("(?:^|\\s)"+searchterm, "gi").test(title) ) {
    $("#result").html("Match: ("+searchterm+"): "+title);
} else {
    $("#result").html("nothing found with term: "+searchterm);   
}

细分：

（？：括号（）在Regex中形成一个捕获组。括号以问号开头，冒号？：形成一个非捕获组。他们只是将这些术语组合在一起

(?: parenthesis () form a capture group in Regex. Parenthesis started with a question mark and colon ?: form a non-capturing group. They just group the terms together

^ 插入符号与字符串的开头匹配

^ the caret symbol matches the beginning of a string

| 该栏是或运算符。

\s 匹配空格（显示为 \\\\ 在字符串中，因为我们必须转义反斜杠）

\s matches whitespace (appears as \\s in the string because we have to escape the backslash)

）关闭组

因此，我们使用的是 \ b ，它不匹配字边界而不适用于unicode字符一个非捕获组，它匹配字符串或空格的开头。

So instead of using \b, which matches word boundaries and doesn't work for unicode characters, we use a non-capturing group which matches the beginning of a string OR whitespace.

这篇关于Javascript RegExp + Word边界+ unicode字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Javascript RegExp + Word边界+ unicode字符 [英] Javascript RegExp + Word boundaries + unicode characters

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

Javascript RegExp + Word边界+ unicode字符 [英] Javascript RegExp + Word boundaries + unicode characters

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭