字符串操作KNIME中的regexMatcher [英] regexMatcher in String Manipulation KNIME
问题描述
我正在尝试使用KNIME中的String Manipulation中的regexMatcher,但是它不起作用.我在写以下内容: regexMatcher($ Document $,"/\ w") 当我想提取所有具有/s或/p或w/p或/200的句子时.但是,即使我的表中有这种情况,也不会检索到任何东西.感谢您的帮助.
I'm trying to use regexMatcher from String Manipulation in KNIME but it doesn't work. I'm writing the following: regexMatcher($Document$,"/\w") when I want to extract all sentences that have /s or /p or w/p or /200. However even though I have such cases in my table nothing is retrieved. I will appreciate your help.
推荐答案
我得到了以下内容:
|Document |isOK |other|strict|
|--------------|-----|-----|------|
|Some /p with q|True |False|False |
|/200 |True |True |False |
|/p |True |True |True |
|/s |True |True |True |
|w/p |True |False|False |
|no slash |False|False|False |
对于表达式:
- 是的:
regexMatcher($Document$, ".*?/\\w.*")
(我想这就是您要的). - 其他:
regexMatcher($Document$, "/\\w.*")
- 严格:
regexMatcher($Document$, "/\\w")
- isOK:
regexMatcher($Document$, ".*?/\\w.*")
(I guess this is what you are after.) - other:
regexMatcher($Document$, "/\\w.*")
- strict:
regexMatcher($Document$, "/\\w")
(文档中最后一个可见字符之后不包含任何内容.)
(Document contains no content after the last visible character.)
您可能会遇到的问题是转义字符串操纵器节点和regexMatcher
的语义.
The problem you might run into is the escaping for the string manipulator node and the semantics of regexMatcher
.
其中的String文字只有一个Java String,因此您必须转义\
(和其他一些字符),因此它变为\\
.
The String literal within there is just a Java String, so you have to escape the \
(and some other characters), so it becomes \\
.
regexMatcher
的语义是匹配整个String,因此您必须在要查找的值之前添加.*?
(非贪婪匹配任何内容),在表达式之后添加.*
(贪婪匹配任何内容)您正在寻找.
(显然,如果我误解了您的问题,那么语义可能已经是您想要的.)
The semantics of regexMatcher
is to match the whole String, so you have to add .*?
(non-greedy match anything) before the value you are looking for and .*
(greedy match anything) after the expression you are looking for.
(Obviously if I misunderstood your question, the semantics is probably already is what you want.)
BTW: in case you want to filter, you should check the Rule-based Row Filter node as it offers an option to directly filter by regex. It uses a different escaping rule (for the isOK option):
-
$Document$ MATCHES ".*?/\w.*" => TRUE
(不允许在引号中转义) -
$Document$ MATCHES /.*?\/\\w.*/ => TRUE
(在斜杠内允许转义(并且/
,\
必须转义,但不需要"
))
$Document$ MATCHES ".*?/\w.*" => TRUE
(escaping is not allowed within quotes)$Document$ MATCHES /.*?\/\\w.*/ => TRUE
(escaping is allowed within slashes (and/
,\
are need to be escaped, but"
is not required))
这篇关于字符串操作KNIME中的regexMatcher的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!