使用Java获取页面的最后修改日期 [英] Get a page's last modified date using Java
问题描述
有没有一种标准的方式来告诉页面最后一次修改?目前我正在这样做:
Is there a standard way to tell when a page was last modified? Currently I am doing this:
URLConnection uCon = url.openConnection();
uCon.setConnectTimeout(5000); // 5 seconds
String lastMod = uCon.getHeaderField("Last-Modified");
System.out.println("last mod: "+lastMod);
但是看起来有些网站没有一个 Last-Modified
字段。
However it looks like some sites do not have a Last-Modified
field.
http://www.cbc.ca 这些标题字段:
X-Origin-Server
Connection
Expires
null
Date
Server
Content-Type
Transfer-Encoding
Cache-Control
我可以解析一个页面来尝试获取日期,但这似乎是一个重大的痛苦。标准是什么?
I could parse a page to try and get its date but this seems like a major pain. What is the standard?
(如果可能,我想坚持使用URLConnection,因为这是我用来下载网页)
推荐答案
没有标准。动态生成的网页通常没有Last-Modified字段,不同的Web页面以不同的方式包含日期。一些网站甚至不包括这样的日期,包括©<当前年份>在底部。您可以尝试寻找靠近底部或顶部的日期,但从网页中提取日期必须是站点特定的。
There is no standard. Dynamically generated web pages generally do not have a Last-Modified field, and different web pages include dates in different ways. Some sites do not even include such a date, including "© <current year>" at the bottom. You could try looking for a date near the bottom or the top, but reliably extracting the date from the web page would have to be site-specific.
这篇关于使用Java获取页面的最后修改日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!