MySql PHP从逗号分隔的数据(标记)中选择不同值的计数 [英] MySql PHP select count of distinct values from comma separated data (tags)

查看:64
本文介绍了MySql PHP从逗号分隔的数据(标记)中选择不同值的计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何从MySql中存储为逗号分隔值的数据中选择不同值的计数?最后,我将使用PHP从MySql输出数据.

How can I select the count of distinct values from data that is stored as comma separated values in MySql? I'll be using PHP to output the data from MySql in the end.

其中包含每个帖子的标签.因此,最后,我试图输出数据,就像stackoverflow对其标签所做的那样,就像这样:

What's in there, are tags for each post. So in the end, I'm trying to output data just like the way stackoverflow does with it's tags, like this:

tag-name x 5

这是表中数据的外观(对内容感到抱歉,但这是一个食谱网站).

This is how the data in the table looks like (sorry about the content, but it's a site for recipes).

"postId"    "tags"                                  "category-code"
"1"         "pho,pork"                              "1"
"2"         "fried-rice,chicken"                    "1"
"3"         "fried-rice,pork"                       "1"
"4"         "chicken-calzone,chicken"               "1"
"5"         "fettuccine,chicken"                    "1"
"6"         "spaghetti,chicken"                     "1"
"7"         "spaghetti,chorizo"                     "1"
"8"         "spaghetti,meat-balls"                  "1"
"9"         "miso-soup"                             "1"
"10"        "chanko-nabe"                           "1"
"11"        "chicken-manchurian,chicken,manchurain" "1"
"12"        "pork-manchurian,pork,manchurain"       "1"
"13"        "sweet-and-sour-pork,pork"              "1"
"14"        "peking-duck,duck"                      "1"

输出

chicken             5 // occurs 5 time in the data above
pork                4 // occurs 4 time in the data above
spaghetti           3 // an so on
fried-rice          2
manchurian          2
pho                 1
chicken-calzone     1
fettuccine          1
chorizo             1
meat-balls          1
miso-soup           1
chanko-nabe         1
chicken-manchurian  1
pork-manchurian     1
sweet-n-sour-pork   1
peking-duck         1
duck                1

我正在尝试select count of all distinct values in there,但是由于它是用逗号分隔的数据,因此似乎无法执行此操作. select distinct不起作用.

I'm attempting to select count of all distinct values in there, but since it's comma separated data, there appears to be no way to do this. select distinct will not work.

您能想到在mysql或使用php来获得输出的一种好方法吗?

Can you think of a good way in either mysql or using php to get output like the way I've done?

推荐答案

解决方案

我真的不知道如何在不创建包含数字的表的情况下将逗号分隔值的水平列表转换为行列表,而该表所包含的数字要与逗号分隔值一样多.如果可以创建此表,这是我的答案:

Solution

I don't really know how to transform an horizontal list of comma-separated values to a list of rows without creating a table containing numbers, as many numbers as you may have comma-separated values. If you can create this table, here is my answer:

SELECT 
  SUBSTRING_INDEX(SUBSTRING_INDEX(all_tags, ',', num), ',', -1) AS one_tag,
  COUNT(*) AS cnt
FROM (
  SELECT
    GROUP_CONCAT(tags separator ',') AS all_tags,
    LENGTH(GROUP_CONCAT(tags SEPARATOR ',')) - LENGTH(REPLACE(GROUP_CONCAT(tags SEPARATOR ','), ',', '')) + 1 AS count_tags
  FROM test
) t
JOIN numbers n
ON n.num <= t.count_tags
GROUP BY one_tag
ORDER BY cnt DESC;

返回:

+---------------------+-----+
| one_tag             | cnt |
+---------------------+-----+
| chicken             |   5 |
| pork                |   4 |
| spaghetti           |   3 |
| fried-rice          |   2 |
| manchurain          |   2 |
| pho                 |   1 |
| chicken-calzone     |   1 |
| fettuccine          |   1 |
| chorizo             |   1 |
| meat-balls          |   1 |
| miso-soup           |   1 |
| chanko-nabe         |   1 |
| chicken-manchurian  |   1 |
| pork-manchurian     |   1 |
| sweet-and-sour-pork |   1 |
| peking-duck         |   1 |
| duck                |   1 |
+---------------------+-----+
17 rows in set (0.01 sec)

请参见 sqlfiddle


解释

场景

See sqlfiddle


Explaination

Scenario

  1. 我们使用逗号连接所有标签,以仅创建一个标签列表,而不是每行创建一个标签
  2. 我们计算列表中有多少标签
  3. 我们发现如何在此列表中获得一个价值
  4. 我们找到了如何将所有值作为不同的行
  5. 我们计算按标签值分组的标签

上下文

让我们构建您的架构:

Context

Let's build your schema:

CREATE TABLE test (
    id INT PRIMARY KEY,
    tags VARCHAR(255)
);

INSERT INTO test VALUES
    ("1",         "pho,pork"),
    ("2",         "fried-rice,chicken"),
    ("3",         "fried-rice,pork"),
    ("4",         "chicken-calzone,chicken"),
    ("5",         "fettuccine,chicken"),
    ("6",         "spaghetti,chicken"),
    ("7",         "spaghetti,chorizo"),
    ("8",         "spaghetti,meat-balls"),
    ("9",         "miso-soup"),
    ("10",        "chanko-nabe"),
    ("11",        "chicken-manchurian,chicken,manchurain"),
    ("12",        "pork-manchurian,pork,manchurain"),
    ("13",        "sweet-and-sour-pork,pork"),
    ("14",        "peking-duck,duck");

连接所有标签列表

我们将在一行中处理所有标签,因此我们使用GROUP_CONCAT来完成工作:

SELECT GROUP_CONCAT(tags SEPARATOR ',') FROM test;

返回所有用逗号分隔的标签:

Returns all tags separated by a comma:

河粉,猪肉,大米,鸡肉,大米,猪肉,鸡肉意面,鸡肉,意大利细面条,鸡肉,意大利面,鸡肉,意大利面,香肠,意大利面,肉丸子,味-汤,香浓火锅,鸡肉满州,鸡肉,满州,猪肉满州,猪肉,满州,糖醋猪肉,猪肉,北京烤鸭,鸭

pho,pork,fried-rice,chicken,fried-rice,pork,chicken-calzone,chicken,fettuccine,chicken,spaghetti,chicken,spaghetti,chorizo,spaghetti,meat-balls,miso-soup,chanko-nabe,chicken-manchurian,chicken,manchurain,pork-manchurian,pork,manchurain,sweet-and-sour-pork,pork,peking-duck,duck

计算所有标签

要计算所有标签,我们将获得标签完整列表的长度,并且在将,替换为空后,我们将删除标签完整列表的长度.我们加1,因为分隔符位于两个值之间.

Count all tags

To count all tags, we get the length of the full list of tags, and we remove the length of the full list of tags after replacing the , by nothing. We add 1, as the separator is between two values.

SELECT LENGTH(GROUP_CONCAT(tags SEPARATOR ',')) - LENGTH(REPLACE(GROUP_CONCAT(tags SEPARATOR ','), ',', '')) + 1 AS count_tags
FROM test;

返回:

+------------+
| count_tags |
+------------+
|         28 |
+------------+
1 row in set (0.00 sec)

在标签列表中获取第N个标签

我们使用SUBSTRING_INDEX函数获取

-- returns the string until the 2nd delimiter\'s occurrence from left to right: a,b
SELECT SUBSTRING_INDEX('a,b,c', ',', 2);

-- return the string until the 1st delimiter, from right to left: c
SELECT SUBSTRING_INDEX('a,b,c', ',', -1);

-- we need both to get: b (with 2 being the tag number)
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX('a,b,c', ',', 2), ',', -1);

采用这种逻辑,要在列表中获得第3个标记,我们使用:

With such logic, to get the 3rd tag in our list, we use:

SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(tags SEPARATOR ','), ',', 3), ',', -1)
FROM test;

返回:

+-------------------------------------------------------------------------------------+
| SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(tags SEPARATOR ','), ',', 3), ',', -1) |
+-------------------------------------------------------------------------------------+
| fried-rice                                                                          |
+-------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

将所有值作为不同的行

我的想法有点棘手:

Get all values as distinct rows

My idea is a little tricky:

  1. 我知道我们可以通过联接表来创建行
  2. 我需要使用上面的请求在列表中获得第N个标签

因此,我们将创建一个表格,其中包含从1到列表中可能具有的最大标签数的所有数字.如果可以有1M个值,请创建1M到1,000,000的条目.对于100个标签,这将是:

So we will create a table containing all numbers from 1 to the maximum number of tags you may have in your list. If you can have 1M values, create 1M entries from 1 to 1,000,000. For 100 tags, this will be:

CREATE TABLE numbers (
  num INT PRIMARY KEY
);

INSERT INTO numbers VALUES
    ( 1 ), ( 2 ), ( 3 ), ( 4 ), ( 5 ), ( 6 ), ( 7 ), ( 8 ), ( 9 ), ( 10 ), 
    ( 11 ), ( 12 ), ( 13 ), ( 14 ), ( 15 ), ( 16 ), ( 17 ), ( 18 ), ( 19 ), ( 20 ), 
    ( 21 ), ( 22 ), ( 23 ), ( 24 ), ( 25 ), ( 26 ), ( 27 ), ( 28 ), ( 29 ), ( 30 ), 
    ( 31 ), ( 32 ), ( 33 ), ( 34 ), ( 35 ), ( 36 ), ( 37 ), ( 38 ), ( 39 ), ( 40 ), 
    ( 41 ), ( 42 ), ( 43 ), ( 44 ), ( 45 ), ( 46 ), ( 47 ), ( 48 ), ( 49 ), ( 50 ), 
    ( 51 ), ( 52 ), ( 53 ), ( 54 ), ( 55 ), ( 56 ), ( 57 ), ( 58 ), ( 59 ), ( 60 ), 
    ( 61 ), ( 62 ), ( 63 ), ( 64 ), ( 65 ), ( 66 ), ( 67 ), ( 68 ), ( 69 ), ( 70 ), 
    ( 71 ), ( 72 ), ( 73 ), ( 74 ), ( 75 ), ( 76 ), ( 77 ), ( 78 ), ( 79 ), ( 80 ), 
    ( 81 ), ( 82 ), ( 83 ), ( 84 ), ( 85 ), ( 86 ), ( 87 ), ( 88 ), ( 89 ), ( 90 ), 
    ( 91 ), ( 92 ), ( 93 ), ( 94 ), ( 95 ), ( 96 ), ( 97 ), ( 98 ), ( 99 ), ( 100 );

现在,使用以下查询获得第num个(num为number中的行):

Now, we get the numth (num being a row in number) using the following query:

SELECT n.num, SUBSTRING_INDEX(SUBSTRING_INDEX(all_tags, ',', num), ',', -1) as one_tag
FROM (
  SELECT
    GROUP_CONCAT(tags SEPARATOR ',') AS all_tags,
    LENGTH(GROUP_CONCAT(tags SEPARATOR ',')) - LENGTH(REPLACE(GROUP_CONCAT(tags SEPARATOR ','), ',', '')) + 1 AS count_tags
  FROM test
) t
JOIN numbers n
ON n.num <= t.count_tags

返回:

+-----+---------------------+
| num | one_tag             |
+-----+---------------------+
|   1 | pho                 |
|   2 | pork                |
|   3 | fried-rice          |
|   4 | chicken             |
|   5 | fried-rice          |
|   6 | pork                |
|   7 | chicken-calzone     |
|   8 | chicken             |
|   9 | fettuccine          |
|  10 | chicken             |
|  11 | spaghetti           |
|  12 | chicken             |
|  13 | spaghetti           |
|  14 | chorizo             |
|  15 | spaghetti           |
|  16 | meat-balls          |
|  17 | miso-soup           |
|  18 | chanko-nabe         |
|  19 | chicken-manchurian  |
|  20 | chicken             |
|  21 | manchurain          |
|  22 | pork-manchurian     |
|  23 | pork                |
|  24 | manchurain          |
|  25 | sweet-and-sour-pork |
|  26 | pork                |
|  27 | peking-duck         |
|  28 | duck                |
+-----+---------------------+
28 rows in set (0.01 sec)

计数标签出现次数

现在我们有了 classic 行,就可以轻松计算每个标记的出现次数.

Count tags occurrences

As soon as we now have classic rows, we can easily count occurrences of each tags.

请参见此答案的顶部以查看请求.

这篇关于MySql PHP从逗号分隔的数据(标记)中选择不同值的计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆