使用Google Big Query为嵌套表格选择每个着陆页的收入 [英] Select revenue per landing page for nested table using Google Big Query

查看:119
本文介绍了使用Google Big Query为嵌套表格选择每个着陆页的收入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图拿起Google Big Query,并且想象我们可以如何为伦敦骑行头盔GA样本数据复制一些标准报告。我偶然发现的一个简单例子是选择收入按登录页面分割的总和。



嵌套表格对我来说是新手,我很努力地找到任何示例使用标准SQL来做这个或类似的事。



这怎么用标准的SQL来完成?或者任何人都可以将我指向任何类似的例子?



更新

不提前提供更多细节。我已经取得了一些进展,使我能够发布一些代码。我已经理解了数据结构更好一些,试图取消嵌套像这样:

Visit_ID,h.page.pagePath AS LandingPage,Sales,Revenue
FROM(
SELECT
visitID AS Visit_ID,
h.hitNumber,
h.page.pagePath
FROM
project_id.dataset.table`,UNNEST(hits)as h
)AS登陆页
JOIN(
SELECT
fullVisitorId AS Visit_ID,sum( AS总收入
FROM $ b $`project_id.dataset.table`
其中
totals.visits> 0 $ AS销售额(总额(totals.transactionRevenue)/ 1000000) b $ b AND totals.transactions> = 1
AND totals.transactionRevenue IS NOT NULL
GROUP BY
fullVisitorId
)AS销售额
ON landingpages.Visit_ID =销售额。 Visit_ID

这会引发错误:

 对于参数类型,运算符=没有匹配的签名:INT64,STRING。支持的签名:ANY = ANY在[23:4] 

我认为这几乎是存在的,但我不明白它试图告诉我什么。

解决方案


运算符=没有匹配的签名类型:INT64,
STRING。支持的签名:ANY = ANY在[23:4]

我不明白它试图告诉我什么。

您试图加入两个完全不同的字段。

不仅它们的值不同 - 它们甚至不同类型

 字段名称数据类型描述
fullVisitorId STRING唯一访客ID(也称为客户端ID)。
visitId INTEGER此会话的标识符。这是通常存储为_utmb cookie的值的一部分。这只对用户是唯一的。对于完全唯一的ID,您应该使用fullVisitorId和visitId的组合。




我如何解决这个连接问题?


请尝试下面的内容(我不是GA人员[在帖子中添加了各自的标签],但至少它应该有助于下一步 - 我试过尽可能保留/重复使用您的原始代码)

  #StandardSQL 
WITH landingpages AS
SELECT
fullVisitorId,
visitID,
h.page.pagePath AS LandingPage
FROM
`project_id.dataset.table`,UNNEST(hits)AS h
WHERE hitNumber = 1
),
sales AS(
SELECT
fullVisitorId,visitID,SUM(totals.transactions)AS Transactions,(SUM(totals.transactionRevenue )/ 1000000)AS收入
FROM
`project_id.dataset.table`
WHERE
totals.visits> 0
AND totals.transactions> = 1
AND totals.transactionRevenue不为空
GROUP BY fullVisitorId,visitID

SELECT
LandingPage,
SUM(交易)AS Transactions,
SUM(收入)AS收入
FROM登录页
JOIN销售额
ON landingpages.VisitID =销售额。 VisitID
AND landingpages.fullVisitorId = sales.fullVisitorId
GROUP BY LandingPage


I'm trying to pick up Google Big Query and figure our how I can replicate some standard reporting for the London Cycle Helmet GA sample data. A simple example I've stumbled up on is selecting sum of revenue split by landing page.

Nested tables are new to me and I'm struggling to find any examples that do this or similar using standard SQL.

How can this be done using standard SQL? Or can anyone point me towards any similar examples?

Update

Apologies for not providing more details upfront. I've made some progress enabling me to post some code. I've understood the data structure a little better and attempting to un-nest like so:

#StandardSQL
SELECT Visit_ID, h.page.pagePath AS LandingPage, Sales, Revenue
FROM (
  SELECT
    visitID AS Visit_ID,
    h.hitNumber,
    h.page.pagePath
  FROM
    `project_id.dataset.table`, UNNEST(hits) as h
)  AS landingpages
JOIN (
  SELECT
      fullVisitorId AS Visit_ID, sum(totals.transactions) AS Sales, (sum(totals.transactionRevenue)/1000000) AS Revenue
    FROM
      `project_id.dataset.table`
    WHERE
      totals.visits>0
      AND totals.transactions>=1
      AND totals.transactionRevenue IS NOT NULL
    GROUP BY
      fullVisitorId
) AS sales
ON landingpages.Visit_ID = sales.Visit_ID

This throws the error:

No matching signature for operator = for argument types: INT64, STRING. Supported signature: ANY = ANY at [23:4]

I think this is nearly there, but I don't understand what it's trying to tell me. How can I fix this join?

解决方案

No matching signature for operator = for argument types: INT64, STRING. Supported signature: ANY = ANY at [23:4]
I don't understand what it's trying to tell me.

You are trying to join on equality of two totally different fields.
Not only they are different by values - they even different by type

Field Name      Data Type   Description
fullVisitorId   STRING      The unique visitor ID (also known as client ID).
visitId         INTEGER     An identifier for this session. This is part of the value usually stored as the _utmb cookie. This is only unique to the user. For a completely unique ID, you should use a combination of fullVisitorId and visitId.  

How can I fix this join?

Try below (I am not GA person [added respective tag to the post], but at least it should help in going to something next - I tried to preserve/reuse your original code as much as possible)

#StandardSQL
WITH landingpages AS (
  SELECT
    fullVisitorId,
    visitID,
    h.page.pagePath AS LandingPage
  FROM
    `project_id.dataset.table`, UNNEST(hits) AS h
  WHERE hitNumber = 1
), 
sales AS (
   SELECT
      fullVisitorId, visitID, SUM(totals.transactions) AS Transactions , (SUM(totals.transactionRevenue)/1000000) AS Revenue
    FROM
      `project_id.dataset.table`
    WHERE
      totals.visits > 0
      AND totals.transactions >= 1
      AND totals.transactionRevenue IS NOT NULL
    GROUP BY fullVisitorId, visitID
)
SELECT 
  LandingPage, 
  SUM(Transactions) AS Transactions, 
  SUM(Revenue) AS Revenue
FROM landingpages 
JOIN sales
ON landingpages.VisitID = sales.VisitID 
AND landingpages.fullVisitorId = sales.fullVisitorId
GROUP BY LandingPage

这篇关于使用Google Big Query为嵌套表格选择每个着陆页的收入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆