使用Python列出在线目录中的所有文件? [英] List all files in an online directory with Python?

查看:134
本文介绍了使用Python列出在线目录中的所有文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好,我只是想知道我正在尝试创建一个从互联网下载文件的python应用程序,但目前它只下载一个名称我知道的文件...有什么办法,我可以得到一个列表的文件在一个在线目录下载?不明显我的代码一次下载一个文件,只是你知道一些我不会做的。

Hello i was just wondering i'm trying to create a python application that downloads files from the internet but at the moment it only downloads one file with the name i know... is there any way that i can get a list of files in an online directory and downloaded them? ill show you my code for downloading one file at a time, just so you know a bit about what i wan't to do.

import urllib2

url = "http://cdn.primarygames.com/taxi.swf"

file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(file_name, 'wb')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)

file_size_dl = 0
block_sz = 8192
while True:
    buffer = u.read(block_sz)
    if not buffer:
        break

    file_size_dl += len(buffer)
    f.write(buffer)
    status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
    status = status + chr(8)*(len(status)+1)
    print status,

f.close()



那么它是从这个网站下载taxi.swf是什么,但是我想要做的就是下载所有的.swf从该目录/到电脑?

So what is does is it downloads taxi.swf from this website but what i want it to do is to download all .swf's from that directory "/" to the computer?

有可能,感谢您的高级。 -Terrii -

Is it possible and thank you so much in advanced. -Terrii-

推荐答案

由于您正在尝试一次下载一堆内容,请先查找网站索引或一个整齐列出您要下载的内容的网页。该网站的移动版本通常比桌面更轻,更容易刮擦。

Since you're trying to download a bunch of things at once, start by looking for a site index or a webpage that neatly lists everything you want to download. The mobile version of the website is usually lighter than the desktop and is easier to scrape.

本网站正是您正在寻找的内容:所有游戏

This website has exactly what you're looking for: All Games.

现在,这真的很简单去做。只需提取所有的游戏页面链接。我使用 BeautifulSoup 请求执行此操作:

Now, it's really quite simple to do. Just, extract all of the game page links. I use BeautifulSoup and requests to do this:

import requests
from bs4 import BeautifulSoup

games_url = 'http://www.primarygames.com/mobile/category/all/'

def get_all_games():
    soup = BeautifulSoup(requests.get(games_url).text)

    for a in soup.find('div', {'class': 'catlist'}).find_all('a'):
        yield 'http://www.primarygames.com' + a['href']

def download_game(url):
    # You have to do this stuff. I'm lazy and won't do it.

if __name__ == '__main__':
    for game in get_all_games():
        download_game(url)

其余由你决定。 download_game()根据游戏的URL下载游戏,所以你必须弄清楚< object> 标签。

The rest is up to you. download_game() downloads a game given the game's URL, so you have to figure out the location of the <object> tag in the DOM.

这篇关于使用Python列出在线目录中的所有文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆