如何从BigQuery中获取多列? [英] How to get several columns from BigQuery?
问题描述
我正在查询BigQuery上的github公共数据集。目前,我对我所需要的最佳查询如下所示。
SELECT类型,created_at,repository_name FROM [githubarchive:github。时间线]
WHERE
(created_at CONTAINS'2012-')
AND repository_owner =twitter
ORDER BY created_at,repository_name;
这给了我来自repository_owner twitter(或任何其他用户)的所有事件(类型对于这个用户拥有的所有仓库(仓库名称),只有一列。
然而,我真正想要的是拥有所有事件(type )在列中,每个存储库一个列(repository_name),或多或少像这样:
bootstrap commons twui
WatchEvent PushEvent PushEvent
WatchEvent WatchEvent PushEvent
时间戳记(created_at)是只作为订购机制相关。这些列不需要同样长,并且单行上的事件不需要同时发生。
我将使用它来将事件进入R包TraMineR做序列分析。
我该如何做到这一点?
<$ p $ p $ SELECT $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $' bootstrap,
SUM(IF(repository_name ='commons',1,0))AS commons,
SUM(IF(repository_name ='twui',1,0))as twui
FROM [githubarchive:github.timeline]
WHERE created_at CONTAINS'2012-'
AND repository_owner =twitter
GROUP BY类型
)
ORDER BY t ype
I am querying the github public dataset on BigQuery. Currently, my best query for what I need looks like the following.
SELECT type, created_at, repository_name FROM [githubarchive:github.timeline]
WHERE
(created_at CONTAINS '2012-')
AND repository_owner="twitter"
ORDER BY created_at, repository_name;
This gives me all the events ("type") from the repository_owner twitter (or any other user) for all the repositories ("repository_name") that this user owns, but in a single column.
However, what I really want is to have all the events ("type") in columns, one column for each repository ("repository_name"), more or less like this:
bootstrap commons twui
WatchEvent PushEvent PushEvent
WatchEvent WatchEvent PushEvent
The timestamp ("created_at") is only relevant as an ordering mechanism. The columns does not need have to be equally long, and the events on a single row does not need to happening at the same time.
I will use this to put the events into the R package TraMineR to do sequence analysis.
How can I accomplish this?
I'm not sure I understand exactly what you're hoping to accomplish, but it is possible to get columns via something like this:
SELECT type, bootstrap, commons, twui
FROM (
SELECT type,
SUM(IF(repository_name = 'bootstrap', 1, 0)) AS bootstrap,
SUM(IF(repository_name = 'commons', 1, 0)) AS commons,
SUM(IF(repository_name = 'twui', 1, 0)) AS twui
FROM [githubarchive:github.timeline]
WHERE created_at CONTAINS '2012-'
AND repository_owner = "twitter"
GROUP BY type
)
ORDER BY type
这篇关于如何从BigQuery中获取多列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!