控制字符的正则表达式是什么? [英] What is a regular expression for control characters?
问题描述
我正在尝试匹配\ ^ c形式的控制字符,其中c是控制字符的任何有效字符。我有这个正则表达式,但它目前无法正常工作: \\ [^] [@ - z]
I'm trying to match a control character in the form \^c where c is any valid character for control characters. I have this regular expression, but it's not currently working: \\[^][@-z]
我认为问题在于插入符号(^)是正则表达式解析引擎的一部分。
I think the problem lies with the fact that the caret character (^) is part of the regular expressions parsing engine.
推荐答案
使用模式 \ ^。
匹配 ^ X
形式的ASCII文本字符串,仅此而已。将 \ ^ X
形式的ASCII文本字符串与模式 \\\ ^。
匹配。您可能希望将该点限制为 [?@_ \ [\] ^ \\]
,因此 \\\\ \\ ^ [AZ?@_ \ [\] ^ \\]
。对于括号中的字符类,它更容易被读作 [?\ x40-\ x5F]
,因此 \\\ ^ [? \ xx40-\ x5F]
用于文字BACKSLASH,后跟文字CIRCUMFLEX,后跟变成有效控制字符之一。
Match an ASCII text string of the form ^X
using the pattern \^.
, nothing more. Match an ASCII text string of the form \^X
with the pattern \\\^.
. You may wish to constrain that dot to [?@_\[\]^\\]
, so \\\^[A-Z?@_\[\]^\\]
. It’s easier to read as [?\x40-\x5F]
for the bracketed character class, hence \\\^[?\x40-\x5F]
for a literal BACKSLASH, followed by a literal CIRCUMFLEX, followed by something that turns into one of the valid control characters.
请注意,这是打印出模式或从文件中读取的内容的结果。这是你需要传递给正则表达式编译器。如果你把它作为一个字符串文字,你当然必须加倍每个反斜杠。 `\\\\\\ ^ [?\\x40-\\ x5F]
是的,看起来很疯狂,但是这是因为Java不支持正则表达式直接作为Groovy和Scala - 或者Perl和Ruby - 做。正则表达式工作总是更容易,没有额外的bbaacckksslllllaasshheesssssess。:)
Note that that is the result of printing out the pattern, or what you’d read from a file. It’s what you need to pass to the regex compiler. If you have it as a string literal, you must of course double each of those backslashes. `\\\\\\^[?\\x40-\\x5F]"
Yes, it is insane looking, but that is because Java does not support regexes directly as Groovy and Scala — or Perl and Ruby — do. Regex work is always easier without the extra bbaacckksslllllaasshheesssssess. :)
如果你有真正的控制字符而不是间接表示它们,你可以使用 \ pC
获取属性GC = Other或 \\的所有文字代码点\\ p {Cc}
仅适用于GC = Control。
If you had real control characters instead of indirect representations of them, you would use \pC
for all literal code points with the property GC=Other, or \p{Cc}
for just GC=Control.
这篇关于控制字符的正则表达式是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!