最少使用的Unicode分隔符 [英] Least used unicode delimiter

查看:275
本文介绍了最少使用的Unicode分隔符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在特定位置用定界符标记我的文本,稍后将用于解析.我想使用最不常用的定界符.我目前正在查看"\ 2"或U + 0002字符.这样使用安全吗?还有其他建议吗?文本是unicode,将同时包含英语和非英语字符.

I'm trying to tag my text with a delimiter at specific places that will be used later for parsing. I want to use a delimiter character that is least frequently used. I'm currently looking at the "\2" or the U+0002 character. Is that safe enough to use? What other suggestions are there? The text is unicode and will have both english and non-english characters.

想要使用一个仍可以被PHP"exploded()"的字符.

A want to use a character that can still be "exploded()" by PHP.

我还希望能够在屏幕上(向浏览器)显示此文本,并且分隔符对于用户是不可见的".我绝对可以使用str_replace()摆脱可见的定界符,但如果有良好的不可见定界符,则无需进行此类处理.

Also I want to be able to display this piece of text on screen (to the browser) and the delimiter will be "invisible" to the user. I can definitely use a str_replace() to get rid of visible delimiters, but if there are good invisible delimiters, then no such processing is needed.

推荐答案

如果这仅用于内部表示(即,不是用于交换和存储),则可以使用非字符代码点,例如U + FFFF. Java会将其用作信号例如,完成了CharacterIterator .

If this is only for an internal representation (i.e. not for interchange and storage), then you can use a non-character code point such as U+FFFF. Java uses that as the signal that a CharacterIterator is done, for example.

这篇关于最少使用的Unicode分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆