Clojure和HBase:在扫描时懒散地迭代 [英] Clojure and HBase: Iterate Lazily over a Scan

查看:183
本文介绍了Clojure和HBase:在扫描时懒散地迭代的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我想在clojure中打印hbase表扫描的输出。

Lets say I want to print the output of an hbase table scan in clojure.

(defmulti scan (fn [table & args] (map class args)))

(defmethod scan [java.lang.String java.lang.String] [table start-key end-key]
    (let [scan (Scan. (Bytes/toBytes start-key) (Bytes/toBytes end-key))]
        (let [scanner (.getScanner table scan)]
            (doseq [result scanner]
                (prn
                    (Bytes/toString (.getRow result))
                    (get-to-map result))))))

将结果转换为地图。它可以这样运行:

where get-to-map turns the result into a map. It could be run like this:

(hbase.table/scan table "key000001" "key999999")

但是如果我想让用户对扫描结果做些什么呢?我可以允许他们传递一个函数作为一个回调,以应用于每个结果。但我的问题是这样:如果我想让用户能够懒洋洋地遍历每个结果,我会返回什么

But what if I want to let the user do something with the scan results? I could allow them to pass a function in as a callback to be applied to each result. But my question is this: what do I return if I want the user to be able to lazily iterate over the each result

(Bytes/toString (.getRow result))
(get-to-map result)

不保留之前的结果,这可能发生在与lazy-seq的简单化的实现。

and not retain the previous results, as might happen in a simplistic implimentation with lazy-seq.

推荐答案

如果接受回调参数,可以在 doseq

If you accept a callback argument, you can just call it inside the doseq:

(defmulti scan [f table & args] (mapv class args)) ; mapv returns vector

(defmethod scan [String String] [f table start-key end-key]
               ; ^- java.lang classes are imported implicitly
  (let [scan ...
        scanner ...] ; no need for two separate lets
    (doseq [result scanner]
      ; call f here, e.g.
      (f result))))

这里 f 。它的返回值,以及结果本身,将立即丢弃。你当然可以用 result 的某些预处理版本调用 f ,例如。 (f(foo result)(bar result))

Here f will be called once per result. Its return value, as well as the result itself, will be discarded immediately. You can of course call f with some preprocessed version of result, e.g. (f (foo result) (bar result)).

您还可以返回结果序列/给客户端并让它做自己的处理。如果序列是惰性的,你需要确保支持它的任何资源在处理期间保持打开(并且可能以后它们被关闭 - 参见 with-open ;处理代码需要在 with-open 中执行,并在处理返回时完成)。

You could also return a sequence / vector of results to the client and let it do its own processing. If the sequence is lazy, you need to make sure that any resources backing it stay open for the duration of the processing (and presumably that they are closed later -- see with-open; the processing code would need to execute inside the with-open and be done with the processing when it returns).

例如,要将预处理结果的向量返回给客户端,您可以执行

For example, to return a vector of preprocessed results to the client you could do

(defmethod scan ...
  (let [...]
    (mapv (fn preprocess-result [result]
            (result->map result))
          scanner)))

然后客户端可以随身携带。使用 map 来返回延迟序列。如果客户端需要打开/关闭资源,您可以接受它作为扫描的参数,以便客户端可以说

The client can then do whatever it wants with them. Use map to return a lazy sequence instead. If the client then needs to open/close a resource, you could accept it as an argument to scan, so that the client could say

(with-open [r (some-resource)]
  ; or mapv, dorun+map, doall+for, ...
  (doseq [result (scan r ...)]
    (do-stuff-with result)))

这篇关于Clojure和HBase:在扫描时懒散地迭代的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆