如何使用python和re匹配和删除维基百科引用 [英] How to match and remove wikipedia refences with python and re

查看:59
本文介绍了如何使用python和re匹配和删除维基百科引用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

    from bs4 import BeautifulSoup
import requests
import time
import keyboard
import re

def searchWiki():
    search = input("What do you want to search for? ").replace(" ", "_").replace("'", "%27")
    url = f"https://en.wikipedia.org/wiki/{search}"
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36'}
    page = requests.get(url, headers=headers)
    soup = BeautifulSoup(page.content, "html.parser")
    title = soup.find("title").get_text()
    info = soup.find_all("p")
    print("Press enter to read the next paragraph")
    print(title)
    print(url)
    for p in info:
        print(p.text.strip())
        keyboard.wait('enter')



searchWiki()

例如,搜索 Tom Holland.它应该是这样的:

For example, search for Tom Holland. It should come up with this:

Thomas Stanley Holland (born 1 June 1996)[1] is an English actor. A graduate of the BRIT School in London...

我想要做的是去掉参考编号和括号.

Want I want to do is remove the refence number and the brackets.

推荐答案

您可以使用正则表达式来完成.

You can do it using regular expressions.

例如你的 p var:

import re

line = p.text.strip()
new_line = re.sub("\[[0-9]+\]", '', line)
print(new_line)

这篇关于如何使用python和re匹配和删除维基百科引用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆