正则表达式模式以删除任何内容 \text {whatever here} [英] Regex pattern to remove anything \text {whatever here}
问题描述
我可以使用 \{\%(.*?)\%\}
将 hell0 {% my text %}
更改为 hello
>
I can use \{\%(.*?)\%\}
to change hell0 {% my text %}
to hello
或 将
useful ()useful
更改为有用有用
我的问题是我想删除 final \text {what here } result
中的任何内容,包括 \text
.所以它变成了最终结果
.
My problem is that I want to remove anything inside final \text { whatever here } result
including \text
. so it becomes final result
.
我尝试了与 r"\\text .*?/}"
相同的方法,但没有奏效.
I tried the same method as r"\\text .*?/ }"
but that did not work.
我有一个代码,它是清理我的数据的类的一部分:
I have a code whic is part of a class that cleans my data:
def get_features(self,s:str)->list:
'''
Produce Shingles or n-Grams of CHARACTERS in a given string.
args:
s: Given String
out: Shingle os a string. If a string is 'how are you' then the returned list is ['how','owa','war','are','rey','eyo','you',] with width = 3
'''
assert self.args_flag, "pass in the arguments for preprocessing by calling set_preprocess_params()"
if self.lower:
s = s.lower()
if self.ascii_only:
s = re.sub(r"[^\x00-\x7F]",'',s)
if self.remove_special: # Remove special characters
s = re.sub(r'[^\w ]+', '', s)
s = re.sub(r'[_ \\]', '', s) # Remove Empty spaces and _ as they are not covered in special chars. Also, I want to remove any "backslashes \"
return s
推荐答案
如果大括号之间没有 {
和 }
可以使用 Python re
这样:
If you have no {
and }
in between braces you can use Python re
this way:
re.sub(r'\s*\\text\s*{[^{}]*}', '', s)
请参阅正则表达式演示 #1.这里,\s*\\text\s*{[^{}]*}
匹配
\s*
- 零个或多个空白字符\\
-\
字符text
-text
字符串\s*
- 零个或多个空格{[^{}]*}
-{
,除了{
和} 之外的任何零个或多个字符
然后是}
.
\s*
- zero or more whitespace chars\\
- a\
chartext
-text
string\s*
- zero or more whitespace{[^{}]*}
-{
, any zero or more chars other than{
and}
and then a}
.
如果需要匹配嵌套的大括号,需要安装PyPi的regex
模块(在终端运行pip install regex
)然后使用
If you need to match nested braces, you need to install the PyPi regex
module (run pip install regex
in the terminal) and then use
import regex
#...
text = regex.sub(r'\s*\\text\s*({(?:[^{}]++|(?1))*})', '', text)
请参阅正则表达式演示 #2.在这里,
\s*\\text\s*
- 匹配用可选空格括起来的\text
({(?:[^{}]++|(?1))*})
- 第 1 组:{
-{
字符(?:[^{}]++|(?1))*
- 除{
和 <之外的一个或多个字符出现零次或多次code>} 或整个 Group 1 模式递归}
-}
字符.
\s*\\text\s*
- matches\text
enclosed with optional whitespace({(?:[^{}]++|(?1))*})
- Group 1:{
- a{
char(?:[^{}]++|(?1))*
- zero or more occurrences of either one or more chars other than{
and}
or the whole Group 1 pattern recursed}
- a}
char.
请参阅一个的Python演示联机.
这篇关于正则表达式模式以删除任何内容 \text {whatever here}的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!