JavaScript 从字符串中删除零宽度空间 (unicode 8203) [英] JavaScript remove ZERO WIDTH SPACE (unicode 8203) from string

查看:17
本文介绍了JavaScript 从字符串中删除零宽度空间 (unicode 8203)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一些处理网站内容的 javascript.当用户按下退格键时,SharePoint 文本编辑器倾向于在文本中放置零宽度空格"字符,这阻碍了我的努力.字符的 unicode 值为 8203,或十六进制的 B200.我试图使用默认的替换"功能来摆脱它.我尝试了很多变体,但没有一个起作用:

I'm writing some javascript that processes website content. My efforts are being thwarted by SharePoint text editor's tendency to put the "zero width space" character in the text when the user presses backspace. The character's unicode value is 8203, or B200 in hexadecimal. I've tried to use the default "replace" function to get rid of it. I've tried many variants, none of them worked:

var a = "o​m"; //the invisible character is between o and m

var b = a.replace(/u8203/g,'');
= a.replace(/uB200/g,'');
= a.replace("\uB200",'');

等等等等.我已经尝试了很多关于这个主题的变体.这些表达式都不起作用(在 Chrome 和 Firefox 中测试)唯一有效的是在表达式中输入实际字符:

and so on and so forth. I've tried quite a few variations on this theme. None of these expressions work (tested in Chrome and Firefox) The only thing that works is typing the actual character in the expression:

var b = a.replace("​",''); //it's there, believe me

这会带来潜在的问题.该字符是不可见的,因此该行本身没有意义.我可以通过评论解决这个问题.但是,如果代码被重用,并且文件是使用非 Unicode 编码保存的(或者当它部署到 SharePoint 时,不能保证它不会弄乱编码)它将停止工作.有没有办法用 unicode 表示法而不是字符本身来写这个?

This poses potential problems. The character is invisible so that line in itself doesn't make sense. I can get around that with comments. But if the code is ever reused, and the file is saved using non-Unicode encoding, (or when it's deployed to SharePoint, there's not guarantee it won't mess up encoding) it will stop working. Is there a way to write this using the unicode notation instead of the character itself?

[我对角色的漫谈]

如果你没有遇到过这个角色,(你可能没有遇到过,因为它是肉眼看不见的,除非它破坏了你的代码并且你在试图定位错误时发现了它)这是一个真正的- 会导致某些类型的模式匹配发生故障的孔.我已经为你关上了野兽:

In case you haven't met this character, (and you probably haven't, seeing as it's invisible to the naked eye, unless it broke your code and you discovered it while trying to locate the bug) it's a real a-hole that will cause certain types of pattern matching to malfunction. I've caged the beast for you:

[ ] <- 小心,不要让它逃脱.

[​] <- careful, don't let it escape.

如果您想查看它,请将这些括号复制到文本编辑器中,然后将光标遍历它们.您会注意到您需要三个步骤来传递看似 2 个字符的内容,并且您的光标会在中间跳过一步.

If you want to see it, copy those brackets into a text editor and then iterate your cursor through them. You'll notice you'll need three steps to pass what seems like 2 characters, and your cursor will skip a step in the middle.

推荐答案

unicode 转义中的数字应该是十六进制的,而 8203 的十六进制是 200B(这确实是一个 Unicode 零宽度空间),所以:

The number in a unicode escape should be in hex, and the hex for 8203 is 200B (which is indeed a Unicode zero-width space), so:

var b = a.replace(/u200B/g,'');

现场示例:

var a = "o​m"; //the invisible character is between o and m
var b = a.replace(/u200B/g,'');
console.log("a.length = " + a.length);      // 3
console.log("a === 'om'? " + (a === 'om')); // false
console.log("b.length = " + b.length);      // 2
console.log("b === 'om'? " + (b === 'om')); // true

这篇关于JavaScript 从字符串中删除零宽度空间 (unicode 8203)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆