与 Google 阅读器同步时如何跳过已知条目? [英] How to skip known entries when syncing with Google Reader?

查看:18
本文介绍了与 Google 阅读器同步时如何跳过已知条目?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了将离线客户端写入 Google 阅读器服务,我想知道如何最好地与该服务同步.

for writing an offline client to the Google Reader service I would like to know how to best sync with the service.

似乎还没有官方文档,到目前为止我找到的最好的来源是:http://code.google.com/p/pyrfeed/wiki/GoogleReaderAPI

There doesn't seem to be official documentation yet and the best source I found so far is this: http://code.google.com/p/pyrfeed/wiki/GoogleReaderAPI

现在考虑一下:根据上面的信息,我可以下载所有未读的项目,我可以指定要下载的项目数量,并使用 atom-id 可以检测到我已经下载的重复条目.

Now consider this: With the information from above I can download all unread items, I can specify how many items to download and using the atom-id I can detect duplicate entries that I already downloaded.

我缺少的是一种指定我只想要自上次同步以来的更新的方法.我可以说给我 10 个(参数 n=10)最新的(参数 r=d)条目.如果我指定参数 r=o(日期升序),那么我也可以指定参数 ot=[last time of sync],但只有这样,升序才不会'当我只想阅读某些项目而不是所有项目时,这没有任何意义.

What's missing for me is a way to specify that I just want the updates since my last sync. I can say give me the 10 (parameter n=10) latest (parameter r=d) entries. If I specify the parameter r=o (date ascending) then I can also specify parameter ot=[last time of sync], but only then and the ascending order doesn't make any sense when I just want to read some items versus all items.

知道如何在不重新下载所有项目而仅拒绝重复项的情况下解决该问题吗?不是一种非常经济的投票方式.

Any idea how to solve that without downloading all items again and just rejecting duplicates? Not a very economic way of polling.

有人建议我可以指定我只想要未读的条目.但是为了使该解决方案以 Google Reader 不再提供此条目的方式工作,我需要将它们标记为已读.反过来,这意味着我需要在客户端上保持自己的已读/未读状态并且当用户登录到 Google 阅读器的在线版本时,条目已经被标记为已读.这对我不起作用.

Someone proposed that I can specify that I only want the unread entries. But to make that solution work in the way that Google Reader will not offer this entries again, I would need to mark them as read. In turn that would mean that I need to keep my own read/unread state on the client and that the entries are already marked as read when the user logs on to the online version of Google Reader. That doesn't work for me.

干杯,马里亚诺

推荐答案

要获取最新条目,请使用标准的 from-newest-date-descending 下载,该下载将从最新条目开始.您将在 XML 结果中收到一个继续"标记,如下所示:

To get the latest entries, use the standard from-newest-date-descending download, which will start from the latest entries. You will receive a "continuation" token in the XML result, looking something like this:

<gr:continuation>CArhxxjRmNsC</gr:continuation>`

浏览结果,找出对您来说有新意的东西.您应该会发现,要么所有结果都是新的,要么直到某个点都是新的,之后的所有内容都已为您所知.

Scan through the results, pulling out anything new to you. You should find that either all results are new, or everything up to a point is new, and all after that are already known to you.

在后一种情况下,您已经完成了,但在前一种情况下,您需要找到比您已经检索的内容更旧的新内容.通过使用延续来获取结果,从您刚刚检索的集合中的最后一个结果之后开始,将其作为 c 参数传递到 GET 请求中,例如:

In the latter case, you're done, but in the former you need to find the new stuff older than what you've already retrieved. Do this by using the continuation to get the results starting from just after the last result in the set you just retrieved by passing it in the GET request as the c parameter, e.g.:

http://www.google.com/reader/atom/user/-/state/com.google/reading-list?c=CArhxxjRmNsC

继续这样,直到你拥有一切.

Continue this way until you have everything.

n 参数,它是要检索的项目数的计数,适用于此,您可以随时更改它.如果检查频率是用户设置的,因此可能非常频繁或非常罕见,您可以使用自适应算法来减少网络流量和处理负载.最初请求少量的最新条目,比如五个(将 n=5 添加到您的 GET 请求的 URL).如果都是新的,在下一个请求中,在你使用延续的地方,要求更大的数字,比如 20.如果这些都是新的,要么是提要有很多更新,要么已经有一段时间了,所以继续以 100 为一组.

The n parameter, which is a count of the number of items to retrieve, works well with this, and you can change it as you go. If the frequency of checking is user-set, and thus could be very frequent or very rare, you can use an adaptive algorithm to reduce network traffic and your processing load. Initially request a small number of the latest entries, say five (add n=5 to the URL of your GET request). If all are new, in the next request, where you use the continuation, ask for a larger number, say, 20. If those are still all new, either the feed has a lot of updates or it's been a while, so continue on in groups of 100 or whatever.

但是,如果我在这里错了,请纠正我,您还想知道,在您下载一个项目后,它的状态是否因使用 Google 阅读它的人而从未读"变为已读"阅读器界面.

However, and correct me if I'm wrong here, you also want to know, after you've downloaded an item, whether its state changes from "unread" to "read" due to the person reading it using the Google Reader interface.

一种方法是:

  1. 更新已在本地阅读的任何项目在 google 上的状态.
  2. 检查并保存供稿的未读计数.(您希望在下一步之前执行此操作,以确保在下载最新项目和检查读取计数之间没有新项目到达.)
  3. 下载最新项目.
  4. 计算您的阅读次数,并将其与谷歌的进行比较.如果提要的阅读次数高于您计算的次数,您就知道有人在 google 上阅读了某些内容.
  5. 如果有人在 Google 上阅读过某些内容,请开始下载已读项目并将它们与您的未读项目数据库进行比较.你会发现一些谷歌说已读但你的数据库声称未读的项目;更新这些.继续这样做,直到您发现这些项目的数量等于您的阅读次数与谷歌的阅读次数之差,或者直到下载量变得不合理.
  6. 如果您没有找到所有已读项目,c'est la vie;将剩余的数字记录为未找到的未读"总数,您还需要将其包含在您认为未读的本地数字的下一次计算中.
  1. Update the status on google of any items that have been read locally.
  2. Check and save the unread count for the feed. (You want to do this before the next step, so that you guarantee that new items have not arrived between your download of the newest items and the time you check the read count.)
  3. Download the latest items.
  4. Calculate your read count, and compare that to google's. If the feed has a higher read count than you calculated, you know that something's been read on google.
  5. If something has been read on google, start downloading read items and comparing them with your database of unread items. You'll find some items that google says are read that your database claims are unread; update these. Continue doing so until you've found a number of these items equal to the difference between your read count and google's, or until the downloads get unreasonable.
  6. If you didn't find all of the read items, c'est la vie; record the number remaining as an "unfound unread" total which you also need to include in your next calculation of the local number you think are unread.

如果用户订阅了很多不同的博客,他也很可能对它们进行了广泛的标记,因此您可以在每个标签的基础上而不是针对整个提要来完成这一切,这应该有助于保持数据量下来,因为您无需为用户未在 Google 阅读器上阅读任何新内容的标签进行任何传输.

If the user subscribes to a lot of different blogs, it's also likely he labels them extensively, so you can do this whole thing on a per-label basis rather than for the entire feed, which should help keep the amount of data down, since you won't need to do any transfers for labels where the user didn't read anything new on google reader.

整个方案也可以应用于其他状态,例如已加星标或未加星标.

This whole scheme can be applied to other statuses, such as starred or unstarred, as well.

现在,正如你所说,这个

Now, as you say, this

...意味着我需要在客户端上保持自己的已读/未读状态,并且当用户登录在线版 Google 阅读器时,条目已被标记为已读.这对我不起作用.

...would mean that I need to keep my own read/unread state on the client and that the entries are already marked as read when the user logs on to the online version of Google Reader. That doesn't work for me.

确实如此.既不保持本地已读/未读状态(因为无论如何您都保留所有项目的数据库)也不标记在谷歌中读取的项目(API 支持)似乎非常困难,那么为什么这对您不起作用?

True enough. Neither keeping a local read/unread state (since you're keeping a database of all of the items anyway) nor marking items read in google (which the API supports) seems very difficult, so why doesn't this work for you?

然而,还有一个问题:用户可能会在谷歌上将已读的内容标记为未读.这会给系统带来一些麻烦.我的建议是,如果你真的想尝试解决这个问题,假设用户通常只会接触最近的东西,并且每次都下载最新的几百个左右的项目,检查所有的状态他们.(这并不是全部那么糟糕;下载 100 个项目花费了我从 300KB 的 0.3 秒到 2.5MB 的 2.5 秒,尽管使用了非常快的宽带连接.)

There is one further hitch, however: the user may mark something read as unread on google. This throws a bit of a wrench into the system. My suggestion there, if you really want to try to take care of this, is to assume that the user in general will be touching only more recent stuff, and download the latest couple hundred or so items every time, checking the status on all of them. (This isn't all that bad; downloading 100 items took me anywhere from 0.3s for 300KB, to 2.5s for 2.5MB, albeit on a very fast broadband connection.)

同样,如果用户有大量订阅,他也可能获得了相当多的标签,因此在每个标签的基础上执行此操作会加快速度.实际上,我建议您不仅要根据每个标签进行检查,而且还要分散检查,每分钟检查一个标签,而不是每 20 分钟检查一次.如果您想降低带宽,您还可以对旧项目的状态变化进行这种大检查",而不是像新内容"检查那样频繁地进行,也许每隔几个小时一次.

Again, if the user has a large number of subscriptions, he's also probably got a reasonably large number of labels, so doing this on a per-label basis will speed things up. I'd suggest, actually, that not only do you check on a per-label basis, but you also spread out the checks, checking a single label each minute rather than everything once every twenty minutes. You can also do this "big check" for status changes on older items less often than you do a "new stuff" check, perhaps once every few hours, if you want to keep bandwidth down.

这有点占用带宽,主要是因为您需要从 Google 下载完整的文章来查看状态.不幸的是,在我们提供给我们的 API 文档中,我看不到任何解决方法.我唯一真正的建议是尽量减少对非新商品的状态检查.

This is a bit of bandwidth hog, mainly because you need to download the full article from Google merely to check the status. Unfortunately, I can't see any way around that in the API docs that we have available to us. My only real advice is to minimize the checking of status on non-new items.

这篇关于与 Google 阅读器同步时如何跳过已知条目?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆