将大规模有序时间序列数据存储在大表衍生物中 [英] storing massive ordered time series data in bigtable derivatives

查看：177 发布时间：2016/11/13 13:53:39 cassandra finance hbase bigtable time-series

本文介绍了将大规模有序时间序列数据存储在大表衍生物中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想弄清楚这些新的数据存储，例如bigtable，hbase和cassandra是真正的。

I am trying to figure out exactly what these new fangled data stores such as bigtable, hbase and cassandra really are.

我使用大量的股票市场数据，数十亿行的价格/报价数据，每天可以加上100亿吉字节（尽管这些文本文件通常压缩至少一个数量级）。这个数据基本上是一些数字，两个或三个短字符串和一个时间戳（通常是毫秒级别）。如果我必须为每一行选择一个唯一的标识符，我必须选择整行（因为交换可能在同一毫秒内为同一个符号生成多个值）。

I work with massive amounts of stock market data, billions of rows of price/quote data that can add up to 100s of gigabytes every day (although these text files often compress by at least an order of magnitude). This data is basically a handful of numbers, two or three short strings and a timestamp (usually millisecond level). If I had to pick a unique identifier for each row, I would have to pick the whole row (since an exchange may generate multiple values for the same symbol in the same millisecond).

我假设将这个数据映射到bigtable（我包括它的导数）的最简单的方法是通过符号名和日期（这可能返回一个非常大的时间序列，超过百万个数据点是不可预见的）。从阅读它们的描述，看起来像多个键可以与这些系统一起使用。我还假定十进制数不是键的好候选。

I suppose the simplest way to map this data to bigtable (I'm including its derivatives) is by symbol name and date (which may return a very large time series, more than million data points isn't unheard of). From reading their descriptions, it looks like multiple keys can be used with these systems. I'm also assuming that decimal numbers are not good candidates for keys.

其中一些系统（例如Cassandra）声称能够进行范围查询。我可以有效地查询MSFT的所有值，对于给定的一天，上午11:00到下午1:30之间？

Some of these systems (Cassandra, for example) claims to be able to do range queries. Would I be able to efficiently query, say, all values for MSFT, for a given day, between 11:00 am and 1:30 pm ?

如果我想要，在一个给定的日期搜索所有符号，并请求价格在$ 10和$ 10.25之间的所有符号（所以我正在搜索的值，并希望作为结果返回的键）？

What if I want to search across ALL symbols for a given day, and request all symbols that have a price between $10 and $10.25 (so I'm searching the values, and want keys returned as a result)?

如果我想得到两次系列，从另一个减去一个，并返回两次系列和他们的结果，我必须在自己的程序中做他的逻辑如何？

What if I want to get two times series, subtract one from the other, and return the two times series and their result, will I have to do his logic in my own program?

阅读相关论文似乎表明这些系统不是很适合大规模时间序列系统。然而，如果谷歌地图等系统是基于它们，我认为时间序列应该工作，以及。例如，将时间视为x轴，将价格视为y轴，并将符号作为命名位置 - 突然之间看起来像是一个可以存储时间序列的理想存储（如果整个地球都可以存储，检索，缩放和注释，股票市场数据应该是微不足道的）。

Reading relevant papers seems to show that these systems are not a very good fit for massive time series systems. However, if systems such as google maps are based on them, I think time series should work as well. For example, think of time as the x-axis, prices as y-axis and symbols as named locations--all of a sudden it looks like bigtable should be the ideal store for time series (if the whole earth can be stored, retrieved, zoomed and annotated, stock market data should be trivial).

有些专家可以指出正确的方向或清除任何误解。

Can some expert point me in the right direction or clear up any misunderstandings.

感谢

将大规模有序时间序列数据存储在大表衍生物中 [英] storing massive ordered time series data in bigtable derivatives

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

将大规模有序时间序列数据存储在大表衍生物中 [英] storing massive ordered time series data in bigtable derivatives

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭