如何使用python和re匹配和删除维基百科引用 [英] How to match and remove wikipedia refences with python and re
本文介绍了如何使用python和re匹配和删除维基百科引用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
from bs4 import BeautifulSoup
import requests
import time
import keyboard
import re
def searchWiki():
search = input("What do you want to search for? ").replace(" ", "_").replace("'", "%27")
url = f"https://en.wikipedia.org/wiki/{search}"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36'}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, "html.parser")
title = soup.find("title").get_text()
info = soup.find_all("p")
print("Press enter to read the next paragraph")
print(title)
print(url)
for p in info:
print(p.text.strip())
keyboard.wait('enter')
searchWiki()
例如,搜索 Tom Holland.它应该是这样的:
For example, search for Tom Holland. It should come up with this:
Thomas Stanley Holland (born 1 June 1996)[1] is an English actor. A graduate of the BRIT School in London...
我想要做的是去掉参考编号和括号.
Want I want to do is remove the refence number and the brackets.
推荐答案
您可以使用正则表达式来完成.
You can do it using regular expressions.
例如你的 p
var:
import re
line = p.text.strip()
new_line = re.sub("\[[0-9]+\]", '', line)
print(new_line)
这篇关于如何使用python和re匹配和删除维基百科引用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文