JavaScript从字符串中删除ZERO WIDTH SPACE(unicode 8203) [英] JavaScript remove ZERO WIDTH SPACE (unicode 8203) from string

查看:922
本文介绍了JavaScript从字符串中删除ZERO WIDTH SPACE(unicode 8203)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一些处理网站内容的JavaScript。当用户按下退格键时,SharePoint文本编辑器倾向于在文本中放置零宽度空格字符,这阻碍了我的努力。
字符的unicode值为8203,或十六进制的B200。我试图使用默认的替换功能来摆脱它。我尝试了很多变种,但没有一个变化:

I'm writing some javascript that processes website content. My efforts are being thwarted by SharePoint text editor's tendency to put the "zero width space" character in the text when the user presses backspace. The character's unicode value is 8203, or B200 in hexadecimal. I've tried to use the default "replace" function to get rid of it. I've tried many variants, none of them worked:

var a = "o​m"; //the invisible character is between o and m

var b = a.replace(/\u8203/g,'');
= a.replace(/\uB200/g,'');
= a.replace("\\uB200",'');

依此类推。我在这个主题上尝试了很多变化。这些表达式都不起作用(在Chrome和Firefox中测试)唯一有效的方法是在表达式中键入实际字符:

and so on and so forth. I've tried quite a few variations on this theme. None of these expressions work (tested in Chrome and Firefox) The only thing that works is typing the actual character in the expression:

var b = a.replace("​",''); //it's there, believe me

这会带来潜在的问题。角色是不可见的,因此线条本身没有意义。我可以通过评论解决这个问题。但是,如果代码被重用,并且使用非Unicode编码保存文件(或者当它部署到SharePoint时,不能保证它不会弄乱编码)它将停止工作。有没有办法用unicode符号而不是字符本身来写这个?

This poses potential problems. The character is invisible so that line in itself doesn't make sense. I can get around that with comments. But if the code is ever reused, and the file is saved using non-Unicode encoding, (or when it's deployed to SharePoint, there's not guarantee it won't mess up encoding) it will stop working. Is there a way to write this using the unicode notation instead of the character itself?

[我对这个角色的谣言]

[My ramblings about the character]

如果你没有遇到这个角色,(你可能没有,看到它肉眼看不见,除非它破坏你的代码并且你在试图找到它时发现了它)这是真实的a孔会导致某些类型的模式匹配失灵。我已经为你关进了野兽:

In case you haven't met this character, (and you probably haven't, seeing as it's invisible to the naked eye, unless it broke your code and you discovered it while trying to locate the bug) it's a real a-hole that will cause certain types of pattern matching to malfunction. I've caged the beast for you:

[]< - 小心,不要让它逃脱。

[​] <- careful, don't let it escape.

如果要查看它,请将这些括号复制到文本编辑器中,然后通过它们迭代光标。你会发现你需要三个步骤来传递看似2个字符的东西,你的光标会跳过中间的一步。

If you want to see it, copy those brackets into a text editor and then iterate your cursor through them. You'll notice you'll need three steps to pass what seems like 2 characters, and your cursor will skip a step in the middle.

推荐答案

unicode转义中的数字应为十六进制,8203的十六进制为200B(实际上是 Unicode零宽度空间),所以:

The number in a unicode escape should be in hex, and the hex for 8203 is 200B (which is indeed a Unicode zero-width space), so:

var b = a.replace(/\u200B/g,'');

实时示例

var a = "o​m"; //the invisible character is between o and m
var b = a.replace(/\u200B/g,'');
console.log("a.length = " + a.length);      // 3
console.log("a === 'om'? " + (a === 'om')); // false
console.log("b.length = " + b.length);      // 2
console.log("b === 'om'? " + (b === 'om')); // true

这篇关于JavaScript从字符串中删除ZERO WIDTH SPACE(unicode 8203)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆