Python 网页抓取 [错误 10060] [英] Python web scraping [Error 10060]

查看:100
本文介绍了Python 网页抓取 [错误 10060]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力获取我的代码,它从网络上抓取 HTML 表信息,以处理 ShipURL.txt 文件中保存的网站列表.该代码从 ShipURL 读取网页地址,然后转到链接并下载表数据并将其保存到 csv.但是我的问题是程序无法完成,因为错误连接尝试失败,因为连接方在一段时间后没有正确响应,或者由于连接的主机未能响应而建立连接失败" 发生在中间,程序停止.现在据我所知,我需要增加请求时间,使用代理或进行 try 语句.我已经浏览了一些关于同一问题的答案,但作为一个新手,我发现这很难理解.任何帮助将不胜感激.

I am struggling to get my code, that scrapes HTML table info from web, to work through a list of websites held in ShipURL.txt file. The code reads in the web page addresses from ShipURL and then goes to the link and downloads the table data and saves it to csv. But my problem is that the program cannot finish, as the error "A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond" occurs in the middle and the program stops. Now as I understand I need to increase the request time, use a proxy or make a try statement. I have scanned through a few answers concerning the same problem, but as an novice I am finding it hard to understand. Any help would be appreciated.

ShipURL.txt https://dl.dropboxusercontent.com/u/110612863/ShipURL.txt

ShipURL.txt https://dl.dropboxusercontent.com/u/110612863/ShipURL.txt

# -*- coding: utf-8 -*-
fm = open('ShipURL.txt', 'r')
Shiplinks = fm.readlines()

import csv
from urllib import urlopen
from bs4 import BeautifulSoup
import re
for line in Shiplinks:
    website = re.findall(r'(https?://\S+)', line)
    website = "".join(str(x) for x in website)
    if website != "":

    with open('ShipData.csv','wb')as f:                         #Creates an empty csv file to which assign values.
        writer = csv.writer(f)
        shipUrl = website
        shipPage = urlopen(shipUrl)

        soup = BeautifulSoup(shipPage, "html.parser")           #Read the web page HTML
        table = soup.find_all("table", { "class" : "table1" })  #Finds table with class table1
        List = []
        columnRow = ""
        valueRow = ""
        Values = []
        for mytable in table:                                   #Loops tables with class table1
            table_body = mytable.find('tbody')                  #Finds tbody section in table
            try:                                                #If tbody exists
                rows = table_body.find_all('tr')                #Finds all rows
                for tr in rows:                                 #Loops rows
                    cols = tr.find_all('td')                    #Finds the columns
                    i = 1                                       #Variable to control the lines
                    for td in cols:                             #Loops the columns
    ##                    print td.text                           #Displays the output
                        co = td.text                            #Saves the column to a variable
    ##                    writer.writerow([co])                 Writes the variable in CSV file row
                        if i == 1:                              #Checks the control variable, if it equals to 1

                            if td.text[ -1] == ":":
                                # võtab kooloni maha ja lisab koma järele
                                columnRow += td.text.strip(":") + "," # Tekkis mõte, et vb oleks lihtsam kohe ühte string panna
                                List.append(td.text)                #.. takes the column value and assigns it to a list called 'List' and..
                                i+=1                                #..Increments i by one

                        else:
                            # võtab reavahetused maha ja lisab koma stringile
                            valueRow += td.text.strip("\n") + ","
                            Values.append(td.text)              #Takes the second columns value and assigns it to a list called Values
                        #print List                             #Checking stuff
                        #print Values                           #Checking stuff


            except:
                print "no tbody"
        # Prindime pealkirjad ja väärtused koos reavahetusega välja ka :)
        print columnRow.strip(",")
        print "\n"
        print valueRow.strip(",")
        # encode'ing hakkas jälle kiusama
        # Kirjutab esimeseks reaks veeru pealkirjad ja teiseks väärtused
        writer.writerow([columnRow.encode('utf-8')])
        writer.writerow([valueRow.encode('utf-8')])

推荐答案

我会用 try/catch 来包装您的 urlopen 调用.像这样:

I would wrap your urlopen call with a try/catch. Like this:

try:
  shipPage = urlopen(shipUrl)
except Error as e:
  print e

这至少可以帮助您找出错误发生的位置.如果没有额外的文件,就很难排除故障,否则.

That'll at least help you figure out where the error is happening. Without the extra files, it'd be hard to troubleshoot, otherwise.

Python 错误文档

这篇关于Python 网页抓取 [错误 10060]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆