如何检测RSS提要中已更改的项目和新的项目? [英] How to detect changed and new items in an RSS feed?

查看:87
本文介绍了如何检测RSS提要中已更改的项目和新的项目?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 feedparser 或其他Python库下载和解析RSS feed;如何可靠地检测new项目和modified项目?

Using feedparser or some other Python library to download and parse RSS feeds; how can I reliably detect new items and modified items?

到目前为止,我已经在Feed中看到了发布日期早于最新项目的新项目.我也看到提要阅读器显示的同一项目发布的内容与单独的项目略有不同.我没有实现提要阅读器应用程序,我只是想要一个明智的策略来归档提要数据.

So far I have seen new items in feeds with publication dates earlier than the latest item. Also I have seen feed readers displaying the same item published with slightly different content as seperate items. I am not implementing a feed reader application, I just want a sane strategy for archiving feed data.

推荐答案

这取决于您对供稿源的信任程度. feedparser为提要项提供一个.id属性-该属性对于RSS和ATOM源都应该是唯一的.例如,请参见feedparser的 ATOM文档.尽管.id将涵盖大多数情况,但可以想象的是,源可能会发布具有相同ID的多个项目.在这种情况下,您别无选择,只能对项目的内容进行哈希处理.

It depends on how much you trust the feed source. feedparser provides an .id attribute for feed items -- this attribute should be unique for both RSS and ATOM sources. For an example, see eg feedparser's ATOM docs. Though .id will cover most cases, it's conceivable that a source might publish multiple items with the same id. In that case, you don't have much choice but to hash the item's content.

这篇关于如何检测RSS提要中已更改的项目和新的项目?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆