蟒蛇 - beautifulsoup,适用于文件夹中的每个文本文件并生成新的文本文件 [英] Python - beautifulsoup, apply in every text file in folder and produce new text file

查看:591
本文介绍了蟒蛇 - beautifulsoup,适用于文件夹中的每个文本文件并生成新的文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是下面的Python - Beautifulsoup code删除从文本文件的HTML元素:

I am using the following Python - Beautifulsoup code to remove html elements from a text file:

from bs4 import BeautifulSoup

with open("textFileWithHtml.txt") as markup:
    soup = BeautifulSoup(markup.read())

with open("strip_textFileWithHtml.txt", "w") as f: 
    f.write(soup.get_text().encode('utf-8'))

我的问题是我如何能将此code到每一个文本文件的文件夹(目录)中,并为每个文本文件,其生产加工一个新的文本文件,并在HTML元素等被删除,而不必调用每个函数和每个文本文件?

The question I have is how can I apply this code to every text file in a folder(directory), and for each text file produce a new text file which is processed and where the html elements etc. are removed, without having to call the function for each and every text file?

推荐答案

我要离开了工作的操作系统,只需更换与从外部输入的硬codeD输入文件中的的argv 数组,并调用一个循环内或一个普通的前pression匹配许多文件,如脚本:

I would leave that work to the OS, simply replace the hardcoded input file with input from external source, in argv array, and invoke the script inside a loop or with a regular expression that matches many files, like:

from bs4 import BeautifulSoup
import sys

for fi in sys.argv[1:]:
    with open(fi) as markup:
        soup = BeautifulSoup(markup.read())

    with open("strip_" + fi, "w") as f: 
        f.write(soup.get_text().encode('utf-8'))

和运行它,如:

python script.py *.txt

这篇关于蟒蛇 - beautifulsoup,适用于文件夹中的每个文本文件并生成新的文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆