如何从BigQuery中获取多列? [英] How to get several columns from BigQuery?

查看:131
本文介绍了如何从BigQuery中获取多列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在查询BigQuery上的github公共数据集。目前,我对我所需要的最佳查询如下所示。

  SELECT类型,created_at,repository_name FROM [githubarchive:github。时间线] 
WHERE
(created_at CONTAINS'2012-')
AND repository_owner =twitter
ORDER BY created_at,repository_name;

这给了我来自repository_owner twitter(或任何其他用户)的所有事件(类型对于这个用户拥有的所有仓库(仓库名称),只有一列。



然而,我真正想要的是拥有所有事件(type )在列中,每个存储库一个列(repository_name),或多或少像这样:

  bootstrap commons twui 
WatchEvent PushEvent PushEvent
WatchEvent WatchEvent PushEvent

时间戳记(created_at)是只作为订购机制相关。这些列不需要同样长,并且单行上的事件不需要同时发生。



我将使用它来将事件进入R包TraMineR做序列分析。



我该如何做到这一点?

解决方案



<$ p

$ p $ SELECT $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $' bootstrap,
SUM(IF(repository_name ='commons',1,0))AS commons,
SUM(IF(repository_name ='twui',1,0))as twui
FROM [githubarchive:github.timeline]
WHERE created_at CONTAINS'2012-'
AND repository_owner =twitter
GROUP BY类型

ORDER BY t ype


I am querying the github public dataset on BigQuery. Currently, my best query for what I need looks like the following.

SELECT type, created_at, repository_name FROM [githubarchive:github.timeline]
WHERE
    (created_at CONTAINS '2012-')
AND repository_owner="twitter"
ORDER BY created_at, repository_name;

This gives me all the events ("type") from the repository_owner twitter (or any other user) for all the repositories ("repository_name") that this user owns, but in a single column.

However, what I really want is to have all the events ("type") in columns, one column for each repository ("repository_name"), more or less like this:

bootstrap     commons    twui
WatchEvent    PushEvent  PushEvent
WatchEvent    WatchEvent PushEvent

The timestamp ("created_at") is only relevant as an ordering mechanism. The columns does not need have to be equally long, and the events on a single row does not need to happening at the same time.

I will use this to put the events into the R package TraMineR to do sequence analysis.

How can I accomplish this?

解决方案

I'm not sure I understand exactly what you're hoping to accomplish, but it is possible to get columns via something like this:

SELECT type, bootstrap, commons, twui
FROM   (
       SELECT type,
              SUM(IF(repository_name = 'bootstrap', 1, 0)) AS bootstrap,
              SUM(IF(repository_name = 'commons', 1, 0)) AS commons,
              SUM(IF(repository_name = 'twui', 1, 0)) AS twui
       FROM   [githubarchive:github.timeline]
       WHERE  created_at CONTAINS '2012-'
       AND    repository_owner = "twitter"
       GROUP BY type
       )
ORDER BY type

这篇关于如何从BigQuery中获取多列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆