刮取Finviz页中表中的特定值 [英] Scrape Finviz Page for Specific Values in Table

查看:79
本文介绍了刮取Finviz页中表中的特定值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我将表示不赞成在服务条款上不允许进行抓取的网站进行抓取,这纯粹是用于从各种网站上假设收集财务数据的学术研究.

I will start out by saying I'm not endorsing scraping of sites that do not allow it in their terms of service and this is purely for academic research of hypothetical gathering of financial data from various websites.

如果要查看此链接:

https://finviz.com/screener.ashx?v = 141& f = geo_usa,ind_stocksonly,sh_avgvol_o100,sh_price_o1&o = ticker

...存储在URLs.csv文件中,并且希望抓取第2-5列(即股票行情指示器,Perf Week,Perf Month,Perf Quarter)并将其导出到CSV文件,这可能代码是什么样的?

...which is stored in a URLs.csv file, and wanted to scrape columns 2-5 (ie. Ticker, Perf Week, Perf Month, Perf Quarter) and wanted to export that to a CSV file, what might the code look like?

尝试使用我过去遇到的另一个用户的答案,到目前为止,我的情况看起来像这样:

Trying to use another user's answer from a past question I had, so far I have something that looks like this:

from bs4 import BeautifulSoup
import requests
import csv, random, time


# Open 'URLs.csv' to read list of URLs in the list
with open('URLs.csv', newline='') as f_urls, open('Results.csv', 'w', newline='') as f_output:
csv_urls = csv.reader(f_urls)
csv_output = csv.writer(f_output, delimiter=',') 

headers = requests.utils.default_headers() 
headers['User-Agent'] = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'

csv_output.writerow(['Ticker', 'Perf Week', 'Perf Month', 'Perf Quarter'])


# Start to read the first URL in the .csv and loop for each URL/row in the .csv
for line in csv_urls:


# Start at first url and look for items
    page = requests.get(line[0])
    soup = BeautifulSoup(page.text, 'html.parser')

    symbol = soup.findAll('a', {'class':'screener-link-primary'})

    perfdata = soup.findAll('a', {'class':'screener-link'})


    lines = list(zip(perfdata, symbol))

    # pair up every two teams
    for perfdata1, symbol1 in zip(lines[1::2], lines[::2]):

        # extract string items
        a1, a2, a3, _ = (x.text for x in symbol1 + perfdata1)

        # reorder and write row
        row = a1, a2, a3
        print(row)
        csv_output.writerow(row)

...我得到以下输出:

...I get the following output:

('1', 'A', '7.52%')
('-0.94%', 'AABA', '5.56%')
('10.92%', 'AAL', '-0.58%')
('4.33%', 'AAOI', '2.32%')
('2.96%', 'AAP', '1.80')
('2.83M', 'AAT', '0.43')
('70.38', 'AAXN', '0.69%')
...

因此,它跳过了一些行,并且没有以正确的顺序返回数据.我想在最终输出中看到:

So it's skipping some rows and not returning the data in the right order. I would like to see in my final output:

('A', '7.52%', -0.94%, 5.56%)
('AA', '0.74%', 0.42%, -20.83%)
('AABA', '7.08%', '0.50%', '7.65%')
('AAC', '31.18%', '-10.95%', '-65.14%')
...

我知道这是代码的最后部分是不正确的,但是正在寻找一些指导.谢谢!

I know it's the last sections of the code that are incorrect but looking for some guidance. Thanks!

推荐答案

问题是您仅提取列 Ticker 和随机单元格( .screener-link ),而是提取行.

the problem is you're only extracting column Ticker and random cell (.screener-link), extract the rows instead.

for line in csv_urls:
    # Start at first url and look for items
    page = requests.get(line[0])
    soup = BeautifulSoup(page.text, 'html.parser')
    rows = soup.select('table[bgcolor="#d3d3d3"] tr')
    for row in rows[1:]:
        # extract string items
        a1, a2, a3, a4 = (x.text for x in row.find_all('td')[1:5])
        row = a1, a2, a3, a4
        print(row)
        # write row
        csv_output.writerow(row)

输出

('A', '7.52%', '-0.94%', '5.56%')
('AA', '0.74%', '0.42%', '-20.83%')
('AABA', '7.08%', '0.50%', '7.65%')
('AAC', '31.18%', '-10.95%', '-65.14%')
('AAL', '-0.75%', '-6.74%', '0.60%')
('AAN', '5.68%', '6.51%', '-6.55%')
('AAOI', '5.47%', '-17.10%', '-23.12%')
('AAON', '0.62%', '1.10%', '8.58%')
('AAP', '0.38%', '-3.85%', '-2.30%')
('AAPL', '2.72%', '-9.69%', '-29.61%')
('AAT', '3.26%', '-2.39%', '10.74%')
('AAWW', '15.87%', '1.55%', '-9.62%')
('AAXN', '7.48%', '11.85%', '-14.24%')
('AB', '1.32%', '6.67%', '-2.73%')
('ABBV', '-0.85%', '0.16%', '-5.12%')
('ABC', '3.15%', '-7.18%', '-15.72%')
('ABCB', '5.23%', '-3.31%', '-22.35%')
('ABEO', '1.71%', '-10.41%', '-28.81%')
('ABG', '1.71%', '8.95%', '12.70%')
('ABM', '7.09%', '26.92%', '5.90%')

这篇关于刮取Finviz页中表中的特定值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆