从ScienceDirect.com自动下载PDF文件 [英] Download PDF's From ScienceDirect.com Automatically

查看:279
本文介绍了从ScienceDirect.com自动下载PDF文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嗨亲爱的

我想在c#中创建一个程序,以便在www.ScienceDirect.com上下载所有新论文。

我想知道如何识别加法新论文及其链接。

请帮帮我。

Hi Dears
I want to Create a program in c# to download All new papers on www.ScienceDirect.com.
I want to know How can I recognize addition of new papers and its link.
Please help me.

推荐答案

一般情况下,你无法识别任何东西。新的HTML页面与旧的HTML页面没有什么不同;它没有提供新和旧项目之间的任何区别,除非它是以这种方式特别设计的,这是你不能指望的。 HTML标题可以包含日期信息,但也不能保证。



因此,您必须定期扫描相同的HTML文档并比较链接集使用您应该保留在数据库中的集合,只有这样您才能发现更改。即使这样,您也无法保证您发现了文章中的更新。这些细节可以基于网站上使用的惯例,但这些约定也可以改变。



-SA
In general case, you cannot recognize anything. The new HTML page is not different from old one; it does not provide any distinction between "new" and "old" items, unless it is specially designed this way, which you cannot expect. The HTML header can contain date information, but it is also not guaranteed.

Therefore, you have to scan the same HTML documents on regular basis and compare the set of links with the set you should preserve in your database, only then you can spot the change. Even then you cannot guarantee that you spotted, for example, updates in the article. Such detail can be based on conventions used on the site, but those conventions also can change.

—SA


这篇关于从ScienceDirect.com自动下载PDF文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆