试图收集使用BeautifulSoup本地文件数据 [英] Trying to collect data from local files using BeautifulSoup

查看：173 发布时间：2016/8/5 19:17:40 python beautifulsoup

本文介绍了试图收集使用BeautifulSoup本地文件数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想运行一个python脚本解析HTML文件，并收集所有以目标=_空白属性。

的链接的列表

我试过以下，但它没有得到来自任何BS4。 SoupStrainer说，在文档它会采取ARGS以同样的方式作为的findAll等，应这项工作？我失去了一些愚蠢的错误？

 导入OS
进口SYS从BS4进口BeautifulSoup，SoupStrainer
从unipath导入路径高清的main（）：    ROOT =路径（os.path.realpath（__ __文件））。祖先（3）
    SRC = ROOT.child（SRC）
    TEMPLATEDIR = src.child（模板）    在os.walk（dirpath，迪尔斯，档案）（TEMPLATEDIR）：
        在路径（路径（dirpath，f）在对文件F）：
            如果path.endswith（HTML）：
                在BeautifulSoup（（目标=_空白）的路径，parse_only = SoupStrainer）链接：
                    打印链接如果__name__ ==__main__：
    sys.exit（main（）中）

解决方案

我认为你需要像这样

 如果path.endswith（HTML）：
    HTMLFILE =开（dirpath）
    在BeautifulSoup（HTMLFILE，parse_only = SoupStrainer（目标=_空白））链接：
        打印链接

I want to run a python script to parse html files and collect a list of all the links with a target="_blank" attribute.

I've tried the following but it's not getting anything from bs4. SoupStrainer says in the docs it'll take args in the same way as findAll etc, should this work? Am I missing some stupid error?

import os
import sys

from bs4 import BeautifulSoup, SoupStrainer
from unipath import Path

def main():

    ROOT = Path(os.path.realpath(__file__)).ancestor(3)
    src = ROOT.child("src")
    templatedir = src.child("templates")

    for (dirpath, dirs, files) in os.walk(templatedir):
        for path in (Path(dirpath, f) for f in files):
            if path.endswith(".html"):
                for link in BeautifulSoup(path, parse_only=SoupStrainer(target="_blank")):
                    print link

if __name__ == "__main__":
    sys.exit(main())

解决方案

I think you need something like this

if path.endswith(".html"):
    htmlfile = open(dirpath)
    for link in BeautifulSoup(htmlfile,parse_only=SoupStrainer(target="_blank")):
        print link

这篇关于试图收集使用BeautifulSoup本地文件数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

试图收集使用BeautifulSoup本地文件数据 [英] Trying to collect data from local files using BeautifulSoup

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

试图收集使用BeautifulSoup本地文件数据 [英] Trying to collect data from local files using BeautifulSoup

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭