使用Python列出在线目录中的所有文件? [英] List all files in an online directory with Python?
问题描述
你好,我只是想知道我正在尝试创建一个从互联网下载文件的python应用程序,但目前它只下载一个名称我知道的文件...有什么办法,我可以得到一个列表的文件在一个在线目录下载?不明显我的代码一次下载一个文件,只是你知道一些我不会做的。
Hello i was just wondering i'm trying to create a python application that downloads files from the internet but at the moment it only downloads one file with the name i know... is there any way that i can get a list of files in an online directory and downloaded them? ill show you my code for downloading one file at a time, just so you know a bit about what i wan't to do.
import urllib2
url = "http://cdn.primarygames.com/taxi.swf"
file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(file_name, 'wb')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)
file_size_dl = 0
block_sz = 8192
while True:
buffer = u.read(block_sz)
if not buffer:
break
file_size_dl += len(buffer)
f.write(buffer)
status = r"%10d [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
status = status + chr(8)*(len(status)+1)
print status,
f.close()
那么它是从这个网站下载taxi.swf是什么,但是我想要做的就是下载所有的.swf从该目录/到电脑?
So what is does is it downloads taxi.swf from this website but what i want it to do is to download all .swf's from that directory "/" to the computer?
有可能,感谢您的高级。 -Terrii -
Is it possible and thank you so much in advanced. -Terrii-
推荐答案
由于您正在尝试一次下载一堆内容,请先查找网站索引或一个整齐列出您要下载的内容的网页。该网站的移动版本通常比桌面更轻,更容易刮擦。
Since you're trying to download a bunch of things at once, start by looking for a site index or a webpage that neatly lists everything you want to download. The mobile version of the website is usually lighter than the desktop and is easier to scrape.
本网站正是您正在寻找的内容:所有游戏。
This website has exactly what you're looking for: All Games.
现在,这真的很简单去做。只需提取所有的游戏页面链接。我使用 BeautifulSoup 和请求执行此操作:
Now, it's really quite simple to do. Just, extract all of the game page links. I use BeautifulSoup and requests to do this:
import requests
from bs4 import BeautifulSoup
games_url = 'http://www.primarygames.com/mobile/category/all/'
def get_all_games():
soup = BeautifulSoup(requests.get(games_url).text)
for a in soup.find('div', {'class': 'catlist'}).find_all('a'):
yield 'http://www.primarygames.com' + a['href']
def download_game(url):
# You have to do this stuff. I'm lazy and won't do it.
if __name__ == '__main__':
for game in get_all_games():
download_game(url)
其余由你决定。 download_game()
根据游戏的URL下载游戏,所以你必须弄清楚< object>
标签。
The rest is up to you. download_game()
downloads a game given the game's URL, so you have to figure out the location of the <object>
tag in the DOM.
这篇关于使用Python列出在线目录中的所有文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!