图形数据库中具有时间序列数据的序列聚合 [英] Sequence Aggregation with Time Series Data in Graph Database

查看:99
本文介绍了图形数据库中具有时间序列数据的序列聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

全部

我是图形数据库领域的新手,想知道这种示例是否适用于图形数据库.

I am new to the graph database area and want to know if this type of example if applicable to a graph database.

说我正在看一场棒球比赛.每个球员打球时,都会有3种可能的结果:击球,三振或步行.

Say I am looking at a baseball game. When each player goes to bat, there are 3 possible outcomes: hit, strikeout, or walk.

对于每个击球手以及整个棒球赛季,我想弄清楚的是序列的计数.

For each batter and throughout the baseball season, what I want to figure out is the counts of the sequences.

例如,对于连续击中n次击球手的击球手,有多少人具有特定的顺序(例如,击打/步行/三振出局或击中/击中/击中/击中),如果是这样,那么有多少击发者击球手重复相同的序列,并按时间索引.为了进一步解释,时间会让我知道在赛季初,中期或下半段是否发生了特定的序列(例如击中/步行/淘汰赛或击中/击中/击中/击中).

For example, for batters that went to the plate n times, how many people had a particular sequence (e.g, hit/walk/strikeout or hit/hit/hit/hit), and if so, how many of the same batters repeated the same sequence indexed by time. To further explain, time would allow me know if a particular sequence (e.g. hit/walk/strikeout or hit/hit/hit/hit) occurred during the beginning of the season, in the mid, or later half.

对于键值类型数据库,原始数据如下所示:

For a key-value type database, the raw data would look as follows:

Batter      Time        Game    Event       Bat
-------     -----       ----    ---------   ---
Charles     April       1       Hit         1
Charles     April       1       strikeout   2
Charles     April       1       Walk        3
Doug        April       1       Walk        1
Doug        April       1       Hit         2
Doug        April       1       strikeout   3
Charles     April       2       strikeout   1
Charles     April       2       strikeout   2
Doug        May         5       Hit         1
Doug        May         5       Hit         2
Doug        May         5       Hit         3
Doug        May         5       Hit         4

因此,我的输出将显示如下:

Hence, my output would appear as follows:

Sequence                    Freq        Unique Batters  Time
-----------------------     ----        --------------  ------
hit                         5000        600             April
walk/strikeout              3000        350             April
strikeout/strikeout/hit     2000        175             April
hit/hit/hit/hit/hit         1000        80              April
hit                         6000        800             May
walk/strikeout              3500        425             May
strikeout/strikeout/hit     2750        225             May
hit/hit/hit/hit/hit         1250        120             May
.                           .           .               .
.                           .           .               .
.                           .           .               .
.                           .           .               .

如果这对于图形数据库是可行的,是否还会扩展?如果击球手有10,000个潜在结果而不是3个击球手可能的结果怎么办?

If this is feasible for a graph database, would it also scale? What if instead of 3 possible outcomes for a batter, there were 10,000 potential outcomes with 10,000,000 batters?

此外,将在组合设置中对10,000个唯一结果进行排序(例如10,000 CHOOSE 2、10,000 CHOOSE 3等).

More so, the 10,000 unique outcomes would be sequenced in a combinatoric setting (e.g. 10,000 CHOOSE 2, 10,000 CHOOSE 3, etc.).

然后我的问题是,如果图形数据库合适,您将如何建议建立解决方案?

My question then is, if a graphing database is appropriate, how would you propose setting up a solution?

非常感谢.

推荐答案

自提出此问题以来,图形数据库已经走了很长一段路,但问题的答案是,绝对可以,图形数据库可用于查找面糊表现模式.

Graph databases have come a long way since this question was asked but the answer to the question is, absolutely yes, a graph database can be used to find batter performance patterns.

免责声明:我是Objectivity,Inc.的现场运营总监.

这不是产品插件.这个问题可以在市场上的许多产品中解决.您特别提到扩大问题,这很可能是某些产品的限制因素.

This isn't a product plug. This problem can be solved in many of the products on the market. You specifically mention scaling the problem up and that may well be the limiting factor for some products.

为解决此问题,我使用的是Objectivity/DB数据库,它是可大规模扩展的对象/图形数据库,具有名为DO for Declarative Objectivity的全功能图形导航查询语言.

To solve this problem I am using the Objectivity/DB database which is a massively scalable, object/graph database with a full-featured graph navigational query language called DO for Declarative Objectivity.

这是我用来解决问题的模式:

Here is the schema that I used to approach the problem:

CREATE CLASS Season  {
    year                : Integer,      
    games               : List { Element: Reference { referenced: Game }, CollectionTypeName: TreeListOfReferences }        
}

CREATE CLASS Game {
    date                : DateTime,
    homeTeam            : Reference { referenced: Team, inverse: homeGames },
    awayTeam            : Reference { referenced: Team, inverse: awayGames },
    from                : Reference { referenced: Season, inverse: games },
    innings             : Reference { referenced: Inning, inverse: game }
}

CREATE CLASS Inning {
    number              : Integer,
    game                : Reference { referenced: Game, inverse: innings },
    batters             : Reference { referenced: AtBat, inverse: inning }
}

CREATE CLASS AtBat {
    result              : String,
    inning              : Reference { referenced: Inning, inverse: batters },
    batter              : Reference { referenced: Player, inverse: atBats },
    nextAtBat           : Reference { referenced: AtBat, inverse: prevAtBat },
    prevAtBat           : Reference { referenced: AtBat, inverse: nextAtBat },
    nextBatter          : Reference { referenced: AtBat, inverse: prevBatter },
    prevBatter          : Reference { referenced: AtBat, inverse: nextBatter }
}

CREATE CLASS Player {
    name                : String,
    teams               : List { Element: Reference { EdgeClass: PlayedFor, EdgeAttribute: team }, CollectionTypeName: TreeListOfReferences },
    atBats              : List { Element: Reference { referenced: AtBat, inverse: batter }, CollectionTypeName: TreeListOfReferences }
}

CREATE CLASS PlayedFor {
    player              : Reference { referenced: Player, inverse: teams },
    team                : Reference { referenced: Team, inverse: players },
    start               : DateTime,
    end                 : DateTime
}

CREATE CLASS Team {
    name                : String,
    homeGames           : Reference { referenced: Game, inverse: homeTeam },
    awayGames           : Reference { referenced: Game, inverse: awayTeam },        
    players             : List { Element: Reference { EdgeClass: PlayedFor, EdgeAttribute: team }, CollectionTypeName: TreeListOfReferences }
}   

这里是架构的摘要.每个播放器都连接到他们自己的AtBat对象中的每个对象.每个玩家的AtBat对象都存在一个双向链接列表.每个AtBat对象都指向拥有它的Player.

Here is a summary of the schema. Every Player is connected to every one of their own AtBat objects. The AtBat objects for each player exists as a doubly-linked list. Every AtBat object points back to the Player that owns it.

样本数据集可能看起来像这样:

A sample dataset might look like this:

这个想法是找到用户定义的AtBat对象序列,然后找到拥有该序列的Player.在下面的查询中,我们正在寻找"Strike Out","Hit"和"Strike Out".图案.找到该模式后,我们需要知道与该模式相关联的Player.因为所有AtBat对象都链接回拥有它的Player,所以我们在查询中要做的就是表达所需的AtBat对象序列,然后从最后一个AtBat对象导航到与其连接的Player对象.查询如下:

The idea is to find a user-defined sequence of AtBat objects and then find the Player that owned that sequence. In the query below, we are looking for a "Strike Out", a "Hit", and a "Strike Out" pattern. When we find that pattern, we need to know the Player associated with that pattern. Because all AtBat objects are linked back to the owning Player, all we have to do in our query is express our desired sequence of AtBat objects and then navigate from the last AtBat object to the Player object it is connected to. The query is as follows:

match path = (a1:AtBat {result == "Strike Out"})
           -[:nextAtBat]->(a2:AtBat {result == "Hit"})
           -[:nextAtBat]->(a3:AtBat {result == "Strike Out"})
           -->(p:Player) 
           group by p.name
           return a1.result, a2.result, a3.result, p.name;

结果示例如下所示:

{
  _Projection
  {
    a1.result:'Strike Out',
    a2.result:'Hit',
    a3.result:'Strike Out',
    p.name:'Player0_TeamA'
  },
  _Projection
  {
    a1.result:'Strike Out',
    a2.result:'Hit',
    a3.result:'Strike Out',
    p.name:'Player0_TeamB'
  },
  _Projection
  {
    a1.result:'Strike Out',
    a2.result:'Hit',
    a3.result:'Strike Out',
    p.name:'Player10_TeamA'
  },

此图描绘了找到图案,然后从最后一个AtBat导航到关联的Player:

This image depicts finding the pattern and then navigating from the last AtBat up to the associated Player:

这篇关于图形数据库中具有时间序列数据的序列聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆