在 Clojure 中,惰性序列总是分块吗? [英] In Clojure, are lazy seqs always chunked?

查看:21
本文介绍了在 Clojure 中,惰性序列总是分块吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的印象是懒惰的序列总是分块的.

I was under the impression that the lazy seqs were always chunked.

=> (take 1 (map #(do (print .) %) (range)))
(................................0)

正如预期的那样,打印了 32 个点,因为 range 返回的惰性序列被分成 32 个元素块.但是,当我使用自己的函数 get-rss-feeds 而不是 range 尝试此操作时,惰性序列不再分块:

As expected 32 dots are printed because the lazy seq returned by range is chunked into 32 element chunks. However, when instead of range I try this with my own function get-rss-feeds, the lazy seq is no longer chunked:

=> (take 1 (map #(do (print .) %) (get-rss-feeds r)))
(."http://wholehealthsource.blogspot.com/feeds/posts/default")

只打印一个点,所以我猜 get-rss-feeds 返回的惰性序列没有分块.确实:

Only one dot is printed, so I guess the lazy-seq returned by get-rss-feeds is not chunked. Indeed:

=> (chunked-seq? (seq (range)))
true

=> (chunked-seq? (seq (get-rss-feeds r)))
false

这是get-rss-feeds的来源:

(defn get-rss-feeds
  "returns a lazy seq of urls of all feeds; takes an html-resource from the enlive library"
  [hr]
  (map #(:href (:attrs %))
       (filter #(rss-feed? (:type (:attrs %))) (html/select hr [:link])))

所以看起来块度取决于惰性序列的产生方式.我查看了函数 range 的源代码,发现它以笨拙"的方式实现.所以我对它是如何工作的感到有点困惑.有人可以澄清一下吗?

So it appears that chunkiness depends on how the lazy seq is produced. I peeked at the source for the function range and there are hints of it being implemented in a "chunky" manner. So I'm a bit confused as to how this works. Can someone please clarify?

这就是我需要知道的原因.

Here's why I need to know.

我必须遵循以下代码:(get-rss-entry (get-rss-feeds h-res) url)

get-rss-feeds 的调用会返回一个我需要检查的供稿 URL 的惰性序列.

The call to get-rss-feeds returns a lazy sequence of URLs of feeds that I need to examine.

get-rss-entry 的调用查找特定条目(其:link 字段与get-rss-entry 的第二个参数匹配).它检查 get-rss-feeds 返回的惰性序列.评估每个项目需要一个跨网络的 http 请求来获取新的 rss 提要.为了最大限度地减少 http 请求的数量,一一检查序列并在匹配时立即停止非常重要.

The call to get-rss-entry looks for a particular entry (whose :link field matches the second argument of get-rss-entry). It examines the lazy sequence returned by get-rss-feeds. Evaluating each item requires an http request across the network to fetch a new rss feed. To minimize the number of http requests it's important to examine the sequence one-by-one and stop as soon as there is a match.

代码如下:

(defn get-rss-entry
  [feeds url]
  (ffirst (drop-while empty? (map #(entry-with-url % url) feeds))))

entry-with-url 返回一个惰性匹配序列,如果没有匹配则返回一个空序列.

entry-with-url returns a lazy sequence of matches or an empty sequence if there is no match.

我对此进行了测试,它似乎可以正常工作(一次评估一个提要网址).但我担心在某个地方,它会以某种方式开始以笨拙"的方式运行,并且一次开始评估 32 个提要.我知道有一种方法可以 避免此处讨论的笨拙行为,但在这种情况下似乎甚至不需要这样做.

I tested this and it seems to work correctly (evaluating one feed url at a time). But I am worried that somewhere, somehow it will start behaving in a "chunky" way and it will start evaluating 32 feeds at a time. I know there is a way to avoid chunky behavior as discussed here, but it doesn't seem to even be required in this case.

我是否非惯用地使用了惰性序列?循环/重复会是更好的选择吗?

Am I using lazy seq non-idiomatically? Would loop/recur be a better option?

推荐答案

正如你上面提到的那样,根据 Chunking 的模糊性似乎是不明智的.在你真的不需要分块的情况下,明确地取消分块"也是明智的,因为如果在其他时候你的代码以一种分块的方式改变,事情就不会中断.另一方面,如果您需要连续的操作,代理是一个很好的工具您可以将下载功能发送给代理,那么无论您如何评估,它们都将一次运行一次,并且只运行一次功能.在某些时候,您可能希望 pmap 您的序列,然后即使使用原子将继续正常工作,但即使取消分块也不起作用.

Depending on the vagueness of Chunking seems unwise as you mention above. Explicitly "un chunking" in cases where you really need it not to be chunked is also wise because then if at some other point your code changes in a way that chunkifies it things wont break. On another note, if you need actions to be sequential, agents are a great tool you could send the download functions to an agent then they will be run one at a time and only once regardless of how you evaluate the function. At some point you may want to pmap your sequence and then even un-chunking will not work though using an atom will continue to work correctly.

这篇关于在 Clojure 中,惰性序列总是分块吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆