Python通过存储在csv中的链接下载文件 [英] Python download files by links stored in csv

查看:705
本文介绍了Python通过存储在csv中的链接下载文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为Python中的新手(2.7)我在寻找下一个建议:

As a newbie in Python (2.7) I`m looking for next suggestion:

我有一个csv文件存储http链接在一个列逗号分隔。 / p>

I have a csv file with stored http links in one column comma delimited.

http://example.com/file.pdf,
http://example.com/file.xls,
http://example.com/file.xlsx,
http://example.com/file.doc,

主要目的是循环浏览所有这些链接,并以原始扩展名和名称下载文件。

The main aim is to loop through all these links and download files by them in original extention and name.

搜索结果和帮助给了我下一个脚本:

So my search results and help here gave me next script:

import urllib2
import pandas as pd 

links = pd.read_csv('links.csv', sep=',', header =(0))

url = links                   # I know this part wrong by don`n know how to do right

user_agent = 'Mozilla 5.0 (Windows 7; Win64; x64)'

file_name = "tessst"          # here the files name by how to get their original names

u = urllib2.Request(url, headers = {'User-Agent' : user_agent})
req = urllib2.urlopen(u)
f = open(file_name, 'wb')
f.write(req.read())

f.close()

请任何帮助

PS不知道熊猫 - 也许csv更好?

P S not sure about pandas - maybe csv better?

推荐答案

如果我可以假设您的CSV文件只有一列,包含链接,这将工作。

If I can assume your CSV file to be one column only, containing links then this would work .

import csv, sys
import requests
import urllib2
import os

filename = 'test.csv'
with open(filename, 'rb') as f:
    reader = csv.reader(f)
    try:
        for row in reader:
            if 'http' in row[0]:
                #print row
                rev  = row[0][::-1]
                i  = rev.index('/')
                tmp = rev[0:i]
                #print tmp[::-1]
                rq = urllib2.Request(row[0])
                res = urllib2.urlopen(rq)
                if not os.path.exists("./"+tmp[::-1]):                
                    pdf = open("./" + tmp[::-1], 'wb')
                    pdf.write(res.read())
                    pdf.close()
                else:
                    print "file: ", tmp[::-1], "already exist"
    except csv.Error as e:
        sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))

这篇关于Python通过存储在csv中的链接下载文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆