Python正则表达式，用于查找MediaWiki标记链接的内容 [英] Python regex for finding contents of MediaWiki markup links

查看：103 发布时间：2020/5/8 1:40:25 python regex mediawiki

本文介绍了Python正则表达式，用于查找MediaWiki标记链接的内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如果我有一些包含以下MediaWiki标记之类的xml:

If I have some xml containing things like the following mediawiki markup:

" ...收集于12世纪，其中[[亚历山大大帝]]是英雄，并以他为代表，有点像英国[[King 亚瑟|亚瑟]]"

" ...collected in the 12th century, of which [[Alexander the Great]] was the hero, and in which he was represented, somewhat like the British [[King Arthur|Arthur]]"

类似于以下内容的适当参数是什么:

what would be the appropriate arguments to something like:

re.findall([[__?__]], article_entry)

我在转义双方括号时遇到了麻烦，并获得了诸如[[Alexander of Paris|poet named Alexander]]

I am stumbling a bit on escaping the double square brackets, and getting the proper link for text like: [[Alexander of Paris|poet named Alexander]]

推荐答案

以下是示例

import re

pattern = re.compile(r"\[\[([\w \|]+)\]\]")
text = "blah blah [[Alexander of Paris|poet named Alexander]] bldfkas"
results = pattern.findall(text)

output = []
for link in results:
    output.append(link.split("|")[0])

# outputs ['Alexander of Paris']

第2版将更多内容添加到正则表达式中，但结果是更改了输出:

Version 2, puts more into the regex, but as a result, changes the output:

import re

pattern = re.compile(r"\[\[([\w ]+)(\|[\w ]+)?\]\]")
text = "[[a|b]] fdkjf [[c|d]] fjdsj [[efg]]"
results = pattern.findall(text)

# outputs [('a', '|b'), ('c', '|d'), ('efg', '')]

print [link[0] for link in results]

# outputs ['a', 'c', 'efg']

版本3，如果您只希望链接不带标题.

Version 3, if you only want the link without the title.

pattern = re.compile(r"\[\[([\w ]+)(?:\|[\w ]+)?\]\]")
text = "[[a|b]] fdkjf [[c|d]] fjdsj [[efg]]"
results = pattern.findall(text)

# outputs ['a', 'c', 'efg']

这篇关于Python正则表达式，用于查找MediaWiki标记链接的内容的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python正则表达式，用于查找MediaWiki标记链接的内容 [英] Python regex for finding contents of MediaWiki markup links

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python正则表达式，用于查找MediaWiki标记链接的内容 [英] Python regex for finding contents of MediaWiki markup links

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭