基于mysql json的趋势标签实现 [英] Mysql json based trending tags implementation

查看:166
本文介绍了基于mysql json的趋势标签实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用mysql json功能在时间序列上识别趋势标签(基于最大匹配). 下面是我的桌子

I am trying to identifying the trending tags (based on maximum hits) on time series using mysql json feature. Below is my table

CREATE TABLE TAG_COUNTER (
    account       varchar(36) NOT NULL,
    time_id       INT NOT NULL,
    counters      JSON,
    PRIMARY KEY   (account, time_id)
)

在每个Web api请求中,我将为每个帐户获取多个不同的标签,并基于标签的数量,我将准备INSERT ON DUPLICATE KEY UPDATE查询.下面的示例显示了带有两个标签的插入.

In every web api request, i will be getting the multiple different tags per account, and based on number of tags, i will prepare the INSERT ON DUPLICATE KEY UPDATE query. Below example is showing insertion with two tags.

INSERT INTO `TAG_COUNTER`
  (`account`, `time_id`, `counters`)
VALUES
  ('google', '2018061023', '{"tag1": 1, "tag2": 1}')
ON DUPLICATE KEY UPDATE `counters` =
  JSON_SET(`counters`,
           '$."tag1"',
           IFNULL(JSON_EXTRACT(`counters`,
                        '$."tag1"'), 0) + 1,
           '$."tag2"',
           IFNULL(JSON_EXTRACT(`counters`,
                        '$."tag2"'), 0) + 1
  );

time_id是yyyyMMddhh,它是每行每小时的汇总.

time_id is yyyyMMddhh, and it is hourly aggregation on each row.

现在我的问题是检索交易标签. 下面的查询将为我汇总tag1,但在进行此查询之前我们不会知道这些标签.

Now my problem is retrival of treding tags. Below query will give me aggregation for tag1, but we will not be knowing the tags before making this query.

SELECT
SUBSTRING(time_id, 1, 6) AS month,
SUM(counters->>'$.tag1')
FROM TAG_COUNTER
WHERE counters->>'$.tag1' > 0
GROUP BY month;

因此,我需要按查询进行通用分组,并按顺序进行排序,以获取每小时/每天/每月的趋势标签.

So i need generic group by query along with order by to get the trending tags for the time hourly/daily/monthly.

预期的输出样本为

Time(hour/day/month)  Tag_name  Tag_count_value(total hits)

当我在网上搜索时,提到的每个地方都如下所示 {"tag_name": "tag1", "tag_count": 1}代替直接{"tag1" : 1} 他们在分组依据中使用了tag_name.

When i have searched the web, every where it is mentioned like below {"tag_name": "tag1", "tag_count": 1} instead of direct {"tag1" : 1} and they were using tag_name in the group by.

Q1)那么,使用常见的json密钥执行.总是强制执行吗??

Q1) So is it always mandatory to have common known json key to perform group by ..?

问题2)如果我必须采用这种方式,那么对于这种新的json标签/值结构,INSERT ON DUPLICATE KEY UPDATE查询中的变化是什么?由于计数器必须在不存在时创建,并且在存在时应增加一.

Q2) If i have to go with this way, then what is the change in my INSERT ON DUPLICATE KEY UPDATE query for this new json label/value struture? Since the counter has to be created when it is not existing and should increment by one when it is existing.

问题3),我是否必须维护对象数组

Q3) do i have to maintain array of objects

[
 {"tag_name": "tag1", "tag_count": 2},
 {"tag_name": "tag2", "tag_count": 3}
]

或以下对象的对象?

{
 {"tag_name": "tag1", "tag_count": 2},
 {"tag_name": "tag2", "tag_count": 3}
}

那么在INSERT和趋势数的RETRIEVAL的json结构项上哪个更好?

So which is better above json structure interms of INSERT and RETRIEVAL of trending count?

问题4)我可以使用现有的{"key" : "value"}格式而不是{"key_label" : key, "value_lable" : "value"}并提取趋势..吗?因为我认为{"key" : "value"}非常简单并且擅长于性能方面.

Q4) Can i go with existing {"key" : "value"} format instead of {"key_label" : key, "value_lable" : "value"} and possible to extract trending ..? since i am thinking that {"key" : "value"} is very straight forward and good at performance wise.

问题5)检索时,我正在使用SUBSTRING(time_id, 1, 6) AS month.可以使用索引吗?

Q5) While retrieving i am using SUBSTRING(time_id, 1, 6) AS month. Will it be able to use index?

或者我是否需要创建多个列,例如time_hour(2018061023)time_day(20180610)time_month(201806),并在特定列上使用查询?

OR do i need to create multiple columns like time_hour(2018061023), time_day(20180610), time_month(201806) and use query on specific columns?

或者我可以使用 mysql日期时间功能?会使用索引来加快检索速度吗?

OR can i use mysql date-time functions? will that use index for faster retrieval?

请帮助.

推荐答案

我看不到很好的理由,为什么要在这里使用JSON.还不清楚,为什么您相信MySQL中的" nosql模式"会做得更好.

I don't see a good reason, why you use JSON here. It's also not clear, why you believe that a "nosql schema" within MySQL would do anything better.

您可能需要的是这样的东西:

What you probably need is something like this:

CREATE TABLE TAG_COUNTER (
    account       varchar(36) NOT NULL,
    time_id       INT NOT NULL,
    tag_name      varchar(50) NOT NULL,
    counter       INT UNSIGNED NOT NULL,
    PRIMARY KEY   (account, time_id, tag_name)
);

这将简化您的查询. INSERT语句如下所示:

This will simplify your queries. The INSERT statement would look like:

INSERT INTO TAG_COUNTER
  (account, time_id, tag_name, counter)
VALUES
  ('google', 2018061023, 'tag1', 1),
  ('google', 2018061023, 'tag2', 1)
ON DUPLICATE KEY UPDATE counter = counter + VALUES(counter);

SELECT语句可能是这样的

The SELECT statement might be something like this

SELECT
    SUBSTRING(time_id, 1, 6) AS month,
    tag_name,
    SUM(counter) AS counter_agg
FROM TAG_COUNTER
GROUP BY month, tag_name
ORDER BY month, counter_agg DESC;

请注意,我并未尝试针对数据大小和性能优化表/架构.那将是一个不同的问题.但是您必须看到,现在查询要简单得多.

Note that I did't try to optimize the table/schema for data size and performance. That would be a different question. But you must see, that the queries are much simpler now.

这篇关于基于mysql json的趋势标签实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆