如何使用正则表达式从python中的字符串中删除标签?(不是在 HTML 中) [英] How to remove tags from a string in python using regular expressions? (NOT in HTML)

查看:27
本文介绍了如何使用正则表达式从python中的字符串中删除标签?(不是在 HTML 中)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从 python 中的字符串中删除标签.

I need to remove tags from a string in python.

<FNT name="Century Schoolbook" size="22">Title</FNT>

删除两端整个标签,只留下标题"的最有效方法是什么?我只看到过使用 HTML 标签执行此操作的方法,而这在 python 中对我不起作用.我特别将它用于 GIS 程序 ArcMap.它有自己的布局元素标签,我只需要删除两个特定标题文本元素的标签.我相信正则表达式应该可以很好地解决这个问题,但我愿意接受任何其他建议.

What is the most efficient way to remove the entire tag on both ends, leaving only "Title"? I've only seen ways to do this with HTML tags, and that hasn't worked for me in python. I'm using this particularly for ArcMap, a GIS program. It has it's own tags for its layout elements, and I just need to remove the tags for two specific title text elements. I believe regular expressions should work fine for this, but I'm open to any other suggestions.

推荐答案

这应该有效:

import re
re.sub('<[^>]*>', '', mystring)

对于那些说正则表达式不是工作的正确工具的人:

问题的背景是所有关于常规/无上下文语言的反对意见都是无效的.他的语言基本上由三个实体组成:a = <b = >c = [^><]+.他想删除任何出现的 acb.这相当直接地将他的问题描述为一个涉及上下文无关语法的问题,并且将其描述为常规问题并不难.

The context of the problem is such that all the objections regarding regular/context-free languages are invalid. His language essentially consists of three entities: a = <, b = >, and c = [^><]+. He wants to remove any occurrences of acb. This fairly directly characterizes his problem as one involving a context-free grammar, and it is not much harder to characterize it as a regular one.

我知道每个人都喜欢你不能用正则表达式解析 HTML"这个答案,但是 OP 不想解析它,他只想执行一个简单的转换.

I know everyone likes the "you can't parse HTML with regular expressions" answer, but the OP doesn't want to parse it, he just wants to perform a simple transformation.

这篇关于如何使用正则表达式从python中的字符串中删除标签?(不是在 HTML 中)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆