如何通过操作 URL 进行网页抓取?蟒蛇 3.5 [英] How do I webscrape by manipulating the URL? Python 3.5

查看：32 发布时间：2021/9/24 19:06:49 python web-scraping

本文介绍了如何通过操作 URL 进行网页抓取?蟒蛇 3.5的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想从网站这张表格中抓取一张股票数据表格在我的代码中，我生成了一个股票代码数组.网站 finviz 的 URL 使用 URL 的最后部分为每个特定股票生成表格(即.https://finviz.com/quote.ashx?t=MBOT 和 MBOT).我想输入我生成的数组作为 URL 的最终输入(即如果我的数组是 [AAPL, MBOT] 那么 https://finviz.com/quote.ashx?t=AAPL 然后 https://finviz.com/quote.ashx?t=MBOT) 从每个 URL 抓取输出表并将抓取的信息输入到 CSV 文件中(在本例中名为output.csv")这是我的代码:

I want to scrape a table of stock data from a website this table In my code, I generate an array of stock symbols. The URL for the website finviz generates tables for each particular stock with the last portion of the URL (ei. https://finviz.com/quote.ashx?t=MBOT , and MBOT). I want to input my generated array as the final input of the URL (ei. if my array is [AAPL, MBOT] then https://finviz.com/quote.ashx?t=AAPL then https://finviz.com/quote.ashx?t=MBOT) scraping the output table from each URL and inputting the scraped information into a CSV file (in this case titled 'output.csv') Here is my code:

import csv
import urllib.request
from bs4 import BeautifulSoup

twiturl = "https://twitter.com/ACInvestorBlog"
twitpage = urllib.request.urlopen(twiturl)
soup = BeautifulSoup(twitpage,"html.parser")

print(soup.title.text)

tweets = [i.text for i in soup.select('a.twitter-cashtag.pretty-link.js-nav b')]
print(tweets)

url_base = "https://finviz.com/quote.ashx?t="
url_list = [url_base + tckr for tckr in tweets]
fpage = urllib.request.urlopen(url_list)
fsoup = BeautifulSoup(fpage, 'html.parser')

with open('output.csv', 'wt') as file:
    writer = csv.writer(file)

    # write header row
    writer.writerow(map(lambda e : e.text, fsoup.find_all('td', {'class':'snapshot-td2-cp'})))

    # write body row
    writer.writerow(map(lambda e : e.text, fsoup.find_all('td', {'class':'snapshot-td2'})))

这是我的错误列表

"C:\Users\Taylor .DESKTOP-0SBM378\venv\helloworld\Scripts\python.exe" "C:/Users/Taylor .DESKTOP-0SBM378/PycharmProjects/helloworld/helloworld"
Antonio Costa (@ACInvestorBlog) | Twitter
Traceback (most recent call last):
['LINU', 'FOSL', 'LINU', 'PETZ', 'NETE', 'DCIX', 'DCIX', 'KDMN', 'KDMN', 'LINU', 'CNET', 'AMD', 'CNET', 'AMD', 'NETE', 'NETE', 'AAPL', 'PETZ', 'CNET', 'PETZ', 'PETZ', 'MNGA', 'KDMN', 'CNET', 'ITUS', 'CNET']
  File "C:/Users/Taylor .DESKTOP-0SBM378/PycharmProjects/helloworld/helloworld", line 17, in <module>
    fpage = urllib.request.urlopen(url_list)
  File "C:\Users\Taylor .DESKTOP-0SBM378\AppData\Local\Programs\Python\Python36-32\Lib\urllib\request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\Taylor .DESKTOP-0SBM378\AppData\Local\Programs\Python\Python36-32\Lib\urllib\request.py", line 517, in open
    req.timeout = timeout
AttributeError: 'list' object has no attribute 'timeout'

Process finished with exit code 1

如何通过操作 URL 进行网页抓取?蟒蛇 3.5 [英] How do I webscrape by manipulating the URL? Python 3.5

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何通过操作 URL 进行网页抓取?蟒蛇 3.5 [英] How do I webscrape by manipulating the URL? Python 3.5

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭