遍历多个文件，并使用Beautiful Soup从HTML附加文本 [英] Iterate through multiple files and append text from HTML using Beautiful Soup

查看：157 发布时间：2020/9/20 8:40:13 python beautifulsoup

本文介绍了遍历多个文件，并使用Beautiful Soup从HTML附加文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个下载HTML文件的目录(其中46个)，并且尝试遍历每个HTML文件，读取它们的内容，剥离HTML并将仅文本附加到文本文件中.但是，我不确定我在哪里搞乱了，因为什么都没有写到我的文本文件中?

I have a directory of downloaded HTML files (46 of them) and I am attempting to iterate through each of them, read their contents, strip the HTML, and append only the text into a text file. However, I'm unsure where I'm messing up, though, as nothing gets written to my text file?

import os
import glob
from bs4 import BeautifulSoup
path = "/"
for infile in glob.glob(os.path.join(path, "*.html")):
        markup = (path)
        soup = BeautifulSoup(markup)
        with open("example.txt", "a") as myfile:
                myfile.write(soup)
                f.close()

-----更新---- 我已经更新了以下代码，但是仍然无法创建文本文件.

-----update---- I've updated my code as below, however the text file still doesn't get created.

import os
import glob
from bs4 import BeautifulSoup
path = "/"
for infile in glob.glob(os.path.join(path, "*.html")):
    markup = (infile)
    soup = BeautifulSoup(markup)
    with open("example.txt", "a") as myfile:
        myfile.write(soup)
        myfile.close()

-----更新2 -----

-----update 2-----

啊，我发现我的目录不正确，所以现在我有了:

Ah, I caught that I had my directory incorrect, so now I have:

import os
import glob
from bs4 import BeautifulSoup

path = "c:\\users\\me\\downloads\\"

for infile in glob.glob(os.path.join(path, "*.html")):
    markup = (infile)
    soup = BeautifulSoup(markup)
    with open("example.txt", "a") as myfile:
        myfile.write(soup)
        myfile.close()

执行此操作时，出现此错误:

When this is executed, I get this error:

Traceback (most recent call last):
  File "C:\Users\Me\Downloads\bsoup.py, line 11 in <module>
    myfile.write(soup)
TypeError: must be str, not BeautifulSoup

我通过更改来解决了最后一个错误

I fixed this last error by changing

myfile.write(soup)

到

myfile.write(soup.get_text())

-----更新3 ----

-----update 3 ----

它现在可以正常工作，下面是工作代码:

It's working properly now, here's the working code:

import os
import glob
from bs4 import BeautifulSoup

path = "c:\\users\\me\\downloads\\"

for infile in glob.glob(os.path.join(path, "*.html")):
    markup = (infile)
    soup = BeautifulSoup(open(markup, "r").read())
    with open("example.txt", "a") as myfile:
        myfile.write(soup.get_text())
        myfile.close()

遍历多个文件，并使用Beautiful Soup从HTML附加文本 [英] Iterate through multiple files and append text from HTML using Beautiful Soup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

遍历多个文件，并使用Beautiful Soup从HTML附加文本 [英] Iterate through multiple files and append text from HTML using Beautiful Soup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭