正则表达式模式以删除任何内容 \text {whatever here} [英] Regex pattern to remove anything \text {whatever here}

查看:61
本文介绍了正则表达式模式以删除任何内容 \text {whatever here}的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以使用 \{\%(.*?)\%\}hell0 {% my text %} 更改为 hello>

I can use \{\%(.*?)\%\} to change hell0 {% my text %} to hello

useful ()useful 更改为有用有用

我的问题是我想删除 final \text {what here } result 中的任何内容,包括 \text.所以它变成了最终结果.

My problem is that I want to remove anything inside final \text { whatever here } result including \text. so it becomes final result.

我尝试了与 r"\\text .*?/}" 相同的方法,但没有奏效.

I tried the same method as r"\\text .*?/ }" but that did not work.

我有一个代码,它是清理我的数据的类的一部分:

I have a code whic is part of a class that cleans my data:

def get_features(self,s:str)->list:
        '''
        Produce Shingles or n-Grams of CHARACTERS in a given string.
        args:
            s: Given String
        out: Shingle os a string. If a string is 'how are you' then the returned list is ['how','owa','war','are','rey','eyo','you',] with width = 3
        '''
        assert self.args_flag, "pass in the arguments for preprocessing by calling set_preprocess_params()"
        
        if self.lower:
            s = s.lower()
            
        if self.ascii_only:
            s = re.sub(r"[^\x00-\x7F]",'',s)

        if self.remove_special: # Remove special characters
            s = re.sub(r'[^\w ]+', '', s)
    
        s = re.sub(r'[_ \\]', '', s) # Remove Empty spaces and _ as they are not covered in special chars. Also, I want to remove any "backslashes \"
        return s

推荐答案

如果大括号之间没有 {} 可以使用 Python re 这样:

If you have no { and } in between braces you can use Python re this way:

re.sub(r'\s*\\text\s*{[^{}]*}', '', s)

请参阅正则表达式演示 #1.这里,\s*\\text\s*{[^{}]*} 匹配

  • \s* - 零个或多个空白字符
  • \\ - \ 字符
  • text - text 字符串
  • \s* - 零个或多个空格
  • {[^{}]*} - {,除了 {} 之外的任何零个或多个字符 然后是 }.
  • \s* - zero or more whitespace chars
  • \\ - a \ char
  • text - text string
  • \s* - zero or more whitespace
  • {[^{}]*} - {, any zero or more chars other than { and } and then a }.

如果需要匹配嵌套的大括号,需要安装PyPi的regex模块(在终端运行pip install regex)然后使用

If you need to match nested braces, you need to install the PyPi regex module (run pip install regex in the terminal) and then use

import regex
#...
text = regex.sub(r'\s*\\text\s*({(?:[^{}]++|(?1))*})', '', text)

请参阅正则表达式演示 #2.在这里,

  • \s*\\text\s* - 匹配用可选空格括起来的 \text
  • ({(?:[^{}]++|(?1))*}) - 第 1 组:
    • { - { 字符
    • (?:[^{}]++|(?1))* - 除 { 和 <之外的一个或多个字符出现零次或多次code>} 或整个 Group 1 模式递归
    • } - } 字符.
    • \s*\\text\s* - matches \text enclosed with optional whitespace
    • ({(?:[^{}]++|(?1))*}) - Group 1:
      • { - a { char
      • (?:[^{}]++|(?1))* - zero or more occurrences of either one or more chars other than { and } or the whole Group 1 pattern recursed
      • } - a } char.

      请参阅一个的Python演示联机.

      这篇关于正则表达式模式以删除任何内容 \text {whatever here}的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆