如何从SECñ-Q文档使用BeautifulSoup提取表 [英] How to extract table from SEC N-Q doc using BeautifulSoup

查看：124 发布时间：2016/8/5 19:16:58 python web-scraping beautifulsoup

本文介绍了如何从SECñ-Q文档使用BeautifulSoup提取表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

（Python 2.7版，BeautifulSoup4）

(python 2.7, BeautifulSoup4)

我试图提取SECñ-Q文件，表格内容。示例HTML浏览：的https://www.sec.gov/Archives/edgar/data/36405/000093247115006447/indexfunds_final.htm

I am trying to extract the table contents from SEC N-Q documents. Sample html here: https://www.sec.gov/Archives/edgar/data/36405/000093247115006447/indexfunds_final.htm

该文件没有标签的。我想搜索一节C.期货合约，并寻找下一个＆LT;表>并提取上述＆lt内容; TR>。有多个C.期货合约中出现一个文档了。

The file has no tag at all. I want to search for section 'C. Futures Contract' and look for the next < table > and extract the contents in < tr >. There are multiple 'C. Futures Contract' occurrences in one document too.

我试过以下code，但一无所获。

I've tried the following code but got nothing.

import requests, re
from bs4 import BeautifulSoup
r = requests.get("https://www.sec.gov/Archives/edgar/data/36405/000093247115006447/indexfunds_final.htm")
futures = soup.find_all(re.compile('C. Futures Contract'))
print futures

[]

推荐答案

首先，如果你是文本搜索，使用文本参数（从BS 4.4起。 0参数被命名为 字符串 ）。

First of all, if you are searching by text, use text argument (starting from bs 4.4.0 the argument is named string).

除此之外，对于每一个期货部分，使用的 find_next（） 寻找下一个表元素。

Aside from that, for every futures section, use find_next() to find the next table element.

工作code：

import re

import requests
from bs4 import BeautifulSoup

response = requests.get("https://www.sec.gov/Archives/edgar/data/36405/000093247115006447/indexfunds_final.htm")
soup = BeautifulSoup(response.content)

futures = soup.find_all(text=re.compile('C. Futures Contract'))
for future in futures:
    for row in future.find_next("table").find_all("tr"):
        print [cell.get_text(strip=True) for cell in row.find_all("td")]

这篇关于如何从SECñ-Q文档使用BeautifulSoup提取表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从SECñ-Q文档使用BeautifulSoup提取表 [英] How to extract table from SEC N-Q doc using BeautifulSoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何从SECñ-Q文档使用BeautifulSoup提取表 [英] How to extract table from SEC N-Q doc using BeautifulSoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭