处理HTML文件Python [英] Processing HTML files Python

查看：93 发布时间：2020/11/24 21:13:20 python html html-parsing

本文介绍了处理HTML文件Python的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对html不太了解... 您如何仅从页面中删除文本? 例如，如果html页面显示为:

I dont know much about html... How do you remove just text from the page? For example if the html page reads as:

<meta name="title" content="How can I make money at home online? No gimmacks please? - Yahoo! Answers">
<title>How can I make money at home online? No gimmicks please? - Yahoo! Answers</title>

我只想提取这个.

How can I make money at home online? No gimmicks please? - Yahoo! Answers

我正在使用re功能:

def striphtml(data):
  p = re.compile(r'<.*?>')
  return p.sub(' ',data)

但是它仍然没有按照我的预期去做..

but still it's not doing what I intend it to do..?

上面的函数称为:

for lines in filehandle.readlines():

        #k = str(section[6].strip())
        myFile.write(lines)

        lines = striphtml(lines)
        content.append(lines)

推荐答案

请勿将正则表达式用于HTML/XML解析.尝试使用 http://www.crummy.com/software/BeautifulSoup/.

Don't use Regular expressions for HTML/XML parsing. Try http://www.crummy.com/software/BeautifulSoup/ instead.

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('Your resource<title>hi</title>')
soup.title.string # Your title string.

这篇关于处理HTML文件Python的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

处理HTML文件Python [英] Processing HTML files Python

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

处理HTML文件Python [英] Processing HTML files Python

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭