与 Google Reader 同步时如何跳过已知条目? [英] How to skip known entries when syncing with Google Reader?

查看:16
本文介绍了与 Google Reader 同步时如何跳过已知条目?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了将离线客户端写入 Google Reader 服务,我想知道如何最好地与该服务同步.

for writing an offline client to the Google Reader service I would like to know how to best sync with the service.

似乎还没有官方文档,到目前为止我发现的最佳来源是:http://code.google.com/p/pyrfeed/wiki/GoogleReaderAPI

There doesn't seem to be official documentation yet and the best source I found so far is this: http://code.google.com/p/pyrfeed/wiki/GoogleReaderAPI

现在考虑一下:使​​用上面的信息,我可以下载所有未读项目,我可以指定要下载的项目数量,并使用 atom-id 可以检测我已经下载的重复条目.

Now consider this: With the information from above I can download all unread items, I can specify how many items to download and using the atom-id I can detect duplicate entries that I already downloaded.

我缺少的是一种指定我只想要自上次同步以来的更新的方法.我可以说给我 10 个(参数 n=10)最新(参数 r=d)条目.如果我指定参数r=o(日期升序),那么我也可以指定参数ot=[last time of sync],但只有这样,升序才不会'当我只想阅读一些项目而不是所有项目时,这没有任何意义.

What's missing for me is a way to specify that I just want the updates since my last sync. I can say give me the 10 (parameter n=10) latest (parameter r=d) entries. If I specify the parameter r=o (date ascending) then I can also specify parameter ot=[last time of sync], but only then and the ascending order doesn't make any sense when I just want to read some items versus all items.

知道如何在不重新下载所有项目并拒绝重复项的情况下解决这个问题吗?不是一种非常经济的投票方式.

Any idea how to solve that without downloading all items again and just rejecting duplicates? Not a very economic way of polling.

有人提议我可以指定我只想要未读条目.但是为了使该解决方案以 Google Reader 不再提供此条目的方式工作,我需要将它们标记为已读.反过来,这意味着我需要在客户端上保持我自己的已读/未读状态,当用户登录到 Google 阅读器的在线版本时,这些条目已经被标记为已读.这对我不起作用.

Someone proposed that I can specify that I only want the unread entries. But to make that solution work in the way that Google Reader will not offer this entries again, I would need to mark them as read. In turn that would mean that I need to keep my own read/unread state on the client and that the entries are already marked as read when the user logs on to the online version of Google Reader. That doesn't work for me.

干杯,马里亚诺

推荐答案

要获取最新条目,请使用标准 from-newest-date-descending 下载,它将从最新条目开始.您将在 XML 结果中收到一个继续"标记,如下所示:

To get the latest entries, use the standard from-newest-date-descending download, which will start from the latest entries. You will receive a "continuation" token in the XML result, looking something like this:

<gr:continuation>CArhxxjRmNsC</gr:continuation>`

浏览结果,找出任何新的东西.您应该会发现要么所有结果都是新的,要么在某一点之前的所有内容都是新的,并且之后的所有结果都是您已知的.

Scan through the results, pulling out anything new to you. You should find that either all results are new, or everything up to a point is new, and all after that are already known to you.

在后一种情况下,你已经完成了,但在前一种情况下,你需要找到比你已经检索到的旧的新东西.通过使用 continuation 从刚刚检索到的集合中的最后一个结果之后开始获取结果,方法是在 GET 请求中将其作为 c 参数传递,例如:

In the latter case, you're done, but in the former you need to find the new stuff older than what you've already retrieved. Do this by using the continuation to get the results starting from just after the last result in the set you just retrieved by passing it in the GET request as the c parameter, e.g.:

http://www.google.com/reader/atom/user/-/state/com.google/reading-list?c=CArhxxjRmNsC

继续这样,直到你拥有一切.

Continue this way until you have everything.

n 参数,它是要检索的项目数的计数,与此配合得很好,您可以随时更改它.如果检查频率是用户设置的,因此可能非常频繁或非常罕见,您可以使用自适应算法来减少网络流量和处理负载.最初请求少量最新条目,比如五个(将 n=5 添加到 GET 请求的 URL).如果都是新的,在下一个请求中,在你使用延续的地方,要求更大的数字,比如 20.如果这些仍然是新的,要么提要有很多更新,要么已经有一段时间了,所以以 100 人为一组继续.

The n parameter, which is a count of the number of items to retrieve, works well with this, and you can change it as you go. If the frequency of checking is user-set, and thus could be very frequent or very rare, you can use an adaptive algorithm to reduce network traffic and your processing load. Initially request a small number of the latest entries, say five (add n=5 to the URL of your GET request). If all are new, in the next request, where you use the continuation, ask for a larger number, say, 20. If those are still all new, either the feed has a lot of updates or it's been a while, so continue on in groups of 100 or whatever.

但是,如果我在这里错了,请纠正我,您还想知道,在您下载一个项目后,它的状态是否会由于使用 Google 阅读它的人而从未读"变为已读"阅读器界面.

However, and correct me if I'm wrong here, you also want to know, after you've downloaded an item, whether its state changes from "unread" to "read" due to the person reading it using the Google Reader interface.

一种方法是:

  1. 在 google 上更新已在本地读取的任何项目的状态.
  2. 检查并保存供稿的未读计数.(您希望在下一步之前执行此操作,以确保在下载最新项目和检查阅读计数之间没有新项目到达.)
  3. 下载最新项目.
  4. 计算您的阅读次数,并将其与 Google 的进行比较.如果 Feed 的阅读次数比您计算的要高,您就知道有人在 google 上阅读过.
  5. 如果有人在 google 上阅读过内容,请开始下载已读项目并将其与您的未读项目数据库进行比较.您会发现一些 google 说已读取的项目,而您的数据库声明未读;更新这些.继续这样做,直到您发现这些项目的数量等于您的阅读次数与 google 的差值,或者直到下载变得不合理为止.
  6. 如果您没有找到所有已读项目,c'est la vie;将剩余的数字记录为未发现未读"总数,您还需要在下次计算您认为未读的本地数字时将其包括在内.
  1. Update the status on google of any items that have been read locally.
  2. Check and save the unread count for the feed. (You want to do this before the next step, so that you guarantee that new items have not arrived between your download of the newest items and the time you check the read count.)
  3. Download the latest items.
  4. Calculate your read count, and compare that to google's. If the feed has a higher read count than you calculated, you know that something's been read on google.
  5. If something has been read on google, start downloading read items and comparing them with your database of unread items. You'll find some items that google says are read that your database claims are unread; update these. Continue doing so until you've found a number of these items equal to the difference between your read count and google's, or until the downloads get unreasonable.
  6. If you didn't find all of the read items, c'est la vie; record the number remaining as an "unfound unread" total which you also need to include in your next calculation of the local number you think are unread.

如果用户订阅了很多不同的博客,他也很可能对它们进行了广泛的标记,因此您可以基于每个标签而不是整个提要来完成整个操作,这应该有助于保持数据量下来,因为您不需要为用户没有在谷歌阅读器上阅读任何新内容的标签进行任何转移.

If the user subscribes to a lot of different blogs, it's also likely he labels them extensively, so you can do this whole thing on a per-label basis rather than for the entire feed, which should help keep the amount of data down, since you won't need to do any transfers for labels where the user didn't read anything new on google reader.

整个方案也可以应用于其他状态,例如已加星标或未加星标.

This whole scheme can be applied to other statuses, such as starred or unstarred, as well.

现在,正如你所说,这个

Now, as you say, this

...这意味着我需要在客户端上保持自己的已读/未读状态,并且当用户登录在线版 Google 阅读器时,这些条目已被标记为已读.这对我不起作用.

...would mean that I need to keep my own read/unread state on the client and that the entries are already marked as read when the user logs on to the online version of Google Reader. That doesn't work for me.

确实如此.保持本地已读/未读状态(因为无论如何您都保留了所有项目的数据库)或在 google 中标记已读项目(API 支持)似乎都非常困难,那么为什么这对您不起作用?

True enough. Neither keeping a local read/unread state (since you're keeping a database of all of the items anyway) nor marking items read in google (which the API supports) seems very difficult, so why doesn't this work for you?

不过,还有一个问题:用户可能会在 google 上将已读内容标记为未读.这给系统带来了一些麻烦.我的建议是,如果你真的想解决这个问题,假设用户一般只会接触最近的东西,每次下载最新的几百个左右的项目,检查所有的状态他们.(这还不是全部那么糟糕;下载 100 个项目花费了我从 300KB 的 0.3 秒到 2.5MB 的 2.5 秒,尽管在非常快的宽带连接上.)

There is one further hitch, however: the user may mark something read as unread on google. This throws a bit of a wrench into the system. My suggestion there, if you really want to try to take care of this, is to assume that the user in general will be touching only more recent stuff, and download the latest couple hundred or so items every time, checking the status on all of them. (This isn't all that bad; downloading 100 items took me anywhere from 0.3s for 300KB, to 2.5s for 2.5MB, albeit on a very fast broadband connection.)

同样,如果用户有大量订阅,他也可能拥有相当多的标签,因此按标签执行此操作会加快速度.实际上,我建议您不仅要按标签检查,还要分散检查,每分钟检查一个标签,而不是每 20 分钟检查一次.您还可以对旧项目的状态更改进行这种大检查",而不是新内容"检查,如果您想降低带宽,可能每隔几个小时检查一次.

Again, if the user has a large number of subscriptions, he's also probably got a reasonably large number of labels, so doing this on a per-label basis will speed things up. I'd suggest, actually, that not only do you check on a per-label basis, but you also spread out the checks, checking a single label each minute rather than everything once every twenty minutes. You can also do this "big check" for status changes on older items less often than you do a "new stuff" check, perhaps once every few hours, if you want to keep bandwidth down.

这有点占用带宽,主要是因为您需要从 Google 下载完整文章来查看状态.不幸的是,在我们可用的 API 文档中,我看不到任何解决方法.我唯一真正的建议是尽量减少对非新项目的状态检查.

This is a bit of bandwidth hog, mainly because you need to download the full article from Google merely to check the status. Unfortunately, I can't see any way around that in the API docs that we have available to us. My only real advice is to minimize the checking of status on non-new items.

这篇关于与 Google Reader 同步时如何跳过已知条目?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆