上次修改的下载文件与其HTTP标头不匹配 [英] Last Modified of file downloaded does not match its HTTP header

查看:104
本文介绍了上次修改的下载文件与其HTTP标头不匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一段Python代码(无论好坏)根据Web服务器上的同一文件检查本地文件。如果不存在,则下载它,如果是,则根据服务器上同一文件的HTTP头检查下载文件的最后修改的 os.stat

I have a piece of Python code that (for better or worse) checks a local file against the same file on a web server. If it's not there, it downloads it, if it does, it checks the os.stat last modified of the downloaded file against the HTTP header of the same file on the server.

问题是,这两个数字似乎不相等,即使它们应该是。这是代码:

Problem is, it seems these two numbers aren't equal even when they should be. Here's the code:

from urllib import urlretrieve
from urllib2 import Request, urlopen
from time import strftime, localtime, mktime, strptime
from os import stat, path

destFile = "logo3w.png"
srvFile = "http://www.google.com/images/srpr/logo3w.png"

if path.exists(destFile):
    localLastModified = stat(destFile).st_mtime
    req = Request(srvFile)
    url_handle = urlopen(req)
    headers = url_handle.info()                        
    srvLastModified = headers.getheader("Last-Modified")
    srvLastModified = mktime(strptime(srvLastModified,
      "%a, %d %b %Y %H:%M:%S GMT"))
    print localLastModified, srvLastModified

else:
    urlretrieve(srvFile, destFile)

返回 print 语句(如果你运行代码两次)是 1334527395.26 1333350817.0

The return of the print statement (if you run the code twice) is 1334527395.26 1333350817.0.

在我看来这两个应该是同样,但他们是完全不同的。本地下载的文件的修改日期实际上是下载到本地计算机的日期,而不是服务器上的最后修改日期。

Seems to me those two should be the same, but they're wildly different. The date modified of the file downloaded locally is in fact the date it was downloaded to the local machine, not the last modified date on the server.

基本上我所要做的就是保留文件的本地缓存(在实际应用程序中会有很多文件),必要时下载它。我知道网络代理默认情况下应该这样做,我正在运行一个存储这些文件的基本WAMP服务器,但我不知道如何将它应用于我的PyQt应用程序。可能有数十个文件需要下载和缓存,其中大约一半很少会发生变化,因此我正在尝试确定检查和获取这些文件的最快方法。

Essentially all I'm trying to do is keep a local cache of the file (would be a lot of files in the actual application), downloading it if necessary. I'm half aware that web proxies should do this by default, and I'm running a basic WAMP server where these files are stored, but I'm not sure how to apply this to my PyQt application. There are potentially dozens of files that would need to be downloaded and cached, and about half of them will rarely ever change, so I'm trying to determine the fastest way to check and grab these files.

也许这甚至不是正确的方法,所以如果有(更好/更多其他)方法,我会全力以赴。

Perhaps this isn't even the right way to go about it, so I'm all ears if there are (far better/numerous other) ways to do this.

推荐答案

urllib.urlretrieve 只需下载文件;它不会复制修改日期。您必须使用 os.utime

import os

# current code
else:
    headers = urlretrieve(srvFile, destFile)[1]
    lmStr = headers.getheader("Last-Modified")
    srvLastModified = mktime(strptime(lmStr, "%a, %d %b %Y %H:%M:%S GMT"))
    os.utime(destFile, (srvLastModified, srvLastModified))

这篇关于上次修改的下载文件与其HTTP标头不匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆