XML 数据更新时只读 [英] Read only when the XML data is updated

查看:34
本文介绍了XML 数据更新时只读的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以用 PHP 解析 RSS -我正在寻找的是能够只获取更新的内容,如果 RSS 没有新的更新,则什么也不做.

I'm able to parse RSS with PHP - What I'm looking for is to be able to get only the updated content, and do nothing if there's no new update to the RSS.

例如,我有这个 RSS 文件,如果没有新内容,什么都不会发生,但是如果有新内容,我想向我的用户发送最新的 RSS 更新,而不是重新发送他们已有的内容.我只解析和发送标题和链接.

For example, I have this RSS File, and if there's no new content, nothing happens, but if there's a new content, I want to send my users the latest RSS update, and not resend what they already have. I'm parsing and sending the title and link only.

我使用 cronjob 每小时检查一次更新.我的问题是如何判断提要现在已更新并且与上一个不一样?这是我用来阅读 RSS 的 PHP 文件.我是将最后一个内容写入文件并进行比较,还是有其他方法可以确定现在的内容与上一个不同?

I use cronjob to check every hour for update. My question is how can I tell that the feed is now updated and not the same as the last one? Here's the PHP file that I'm using to read the RSS. Do I write the last content to file and compare them or is there any other way to determine that the content is now different from the last?

更新:我不得不重新发布这篇文章,因为我仍在努力让它发挥作用.虽然我接受了一些答案,但它们很难实施,例如散列选项最初看起来是个好主意,但由于要检查数千个 RSS,几乎不可能将它们全部散列.

Update: I had to resurrect this post because I'm still trying to get it to work. Although I accepted a few answers, they have been very hard to implement, for example the hashing option looked like a good idea initially, but as thousands of RSS would be checked, it would be almost impossible to hash them all.

再次,有人建议使用 HTTP 缓存 - 我找不到一个简单的演示,所以我几乎被卡住了.

Again, someone suggested HTTP Cache - I couldn't find a simple demo so I'm practically stuck.

任何进一步的建议将不胜感激.

Any further suggest would be highly appreciated.

推荐答案

您可以通过以下两种方式使用哈希:

You could use hashes for this, in two ways:

  1. 为了便于更新 - 请求更新时,您对整个提要进行散列,并将结果与​​上次的散列进行比较 - 如果它们相同,您就知道提要没有更改,甚至可以在解析之前停止.
  2. 识别更改 - 在解析时,您对每个项目进行哈希处理,并将其与之前运行中存储的哈希值进行比较.如果它匹配一个,你就知道你以前见过它.

如果有问题的提要为其项目提供 guid,您可以通过存储 guid<> 哈希对来优化此过程.这将使比较更快,因为您只会将项目与已知的先前版本进行比较,而不是与所有先前的项目进行比较.

If the feed in question offers guids for its items you could refine this process by storing guid<>hash pairs. This would make the comparison quicker, as you would only compare items to known previous versions instead of comparing to all previous items.

您仍然需要一些过期/清除机制来将存储的散列数量保持在界限内,但鉴于您只存储相对较短的字符串(取决于所选的散列算法),您应该能够保留相当多的积压在遇到性能问题之前.

You'd still need some expiration/purge mechanism to keep the amount of stored hashes within bounds, but given that you only store relatively short strings (depending on the chosen hash algorithm), you should be able to keep quite a backlog before getting performance problems.

这篇关于XML 数据更新时只读的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆