Python Beautiful Soup和Regex-不替换双引号 [英] Python Beautiful Soup and Regex - Double quotes not getting replaced

查看:163
本文介绍了Python Beautiful Soup和Regex-不替换双引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用BeautifulSoup抓取此网站和正则表达式.这样做时,我遇到一个带有双引号"的问题,我想替换双引号"并将其另存为.txt文件.但这并不能代替双引号".我们尝试了.replace()方法,但失败了.代码如下:

I am trying to scrape this website using BeautifulSoup and Regex. While doing so, I encountered a question which was having "double quotes" and I wanted to replace the "double quotes" and save it as a .txt file. But it is not replacing the "double quotes". We tried .replace() method but I failed. The code is as follows:

url = 'http://www.sanfoundry.com/operating-system-mcqs-process-scheduling-queue/'
r = requests.get(url)
soup = bs(r.content)
data = soup.find_all('div', {'class':'entry-content'})
data1 = data[0].text
pattern = r'^\d{1,2}[\.|\)]([\s|\S].*)|(^[a-z]\)\s.*)|^View Answer\s?(Answer:.*)'
#pattern = r'^\d{1,2}[\.|\)]\s*(.*)|(^[a-z]\)\s.*)|^View Answer\s?(Answer:.*)'
reg = re.compile(pattern)
#with open(r'C:\Users\dhvani\Google Drive\Python\Data Scraping\byb.txt', 'a') as f:
with open(r'C:\Users\Jeri_Dabba\Google Drive\Python\Data Scraping\byb.txt', 'a') as f:

    for i in data1.split('\n'):
        if reg.search(i).group(1):
           y = reg.search(i).group(1)
           y = y.replace('"', '')
           f.write(y + "\n")

当我检查.txt文件时,未替换双引号".可能是什么问题?

When I checked the .txt file the "double quotes" was not replaced. What might be the problem?

我是python的新手.

I am new to python.

推荐答案

此网站包含的字符不是普通"双引号字符,即不是" U + 0022

This website includes characters that aren't 'normal' double quote characters i.e. not " U+0022

该站点包含左右双引号unicode U + 201C和U + 201D

The site includes right and left double quotation marks unicode " " U+201C and U+201D

您可以替换这些:

y = y.replace('"', '')
y = y.replace('"', '')
y = y.replace('"', '')

这篇关于Python Beautiful Soup和Regex-不替换双引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆