whitespaceAndNewlineCharacterSet()中有哪些字符? [英] What characters are in whitespaceAndNewlineCharacterSet()?
问题描述
我正在解析一些令人讨厌的文件-您知道,在单行中混合逗号,空格和制表符分隔符,然后通过文本编辑器运行该文件,并在CRLF的第65列自动换行。 gh。
I'm parsing some nasty files - you know, mix comma, space and tab delimiters in a single line, and then run it through a text editor that word wraps at column 65 with CRLF. Ugh.
在可可分析中,我使用Apple的 whitespaceAndNewlineCharacterSet
。但是,恰好在那个集合中是什么?该文档说 Unicode通用类别Z *,U000A〜U000D和U0085。我能够找到最后三个(85很有趣,但是〜的含义是什么,什么是General Category Z *?
As part of my efforts to parse this in Cocoa, I use Apple's whitespaceAndNewlineCharacterSet
. But what, exactly is in that set? The documentation says "Unicode General Category Z*, U000A ~ U000D, and U0085". I was able to find the last three (85 is interesting, but what does the ~ mean, and what is General Category Z*?
那里有任何Unicode专家吗?
Any Unicode gurus out there?
推荐答案
NSCharacterSet是一个不透明的类,不会轻易公开其内容,您必须将其更多地视为成员身份规则服务
NSCharacterSet is an opaque class that does not expose its content easily. You have to see it more as a "membership" rule service than a list of characters.
这可能有点残酷,但是您可以通过遍历所有16位标量值并检查来获取NSCharacterSet中的成员列表。对于集合中的成员资格:
This may be a somewhat brutal approach, but you can get the list of members in an NSCharacterSet by going through all 16 bit scalar values and checking for membership in the set:
let charSet = NSCharacterSet.whitespaceAndNewlineCharacterSet()
for i in 0..<65536
{
let u:UInt16 = UInt16(i)
if charSet.characterIsMember(u)
{ print("\(u): \(Character(UnicodeScalar(u)))") }
}
这为不可显示给出了令人惊讶的结果字符集,但它可能可以回答您的问题。
This gives surprising results for non-displayable character sets but it can probably answer your question.
这篇关于whitespaceAndNewlineCharacterSet()中有哪些字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!