原始Sql语句以相同名称的不同字符串按列分组 [英] Raw Sql statement to group by column with different strings for the same name

查看:81
本文介绍了原始Sql语句以相同名称的不同字符串按列分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是创建更复杂的sql语句的新手,我试图通过一个名称进行分组,该名称可以以不同的形式出现.例如,名称可以是"Kane,Patrick","P.Kane,Patrick","Kane,Patrick *

Fairly new to creating a more complex sql statement, I'm trying to do a group by a name where the name can come in different forms. for example, name can be "Kane, Patrick", "P.Kane, Patrick", "Kane, Patrick*

到目前为止,我所能查询的大约7000个结果以下:

what i have so far below which queries around 7000 results:

SELECT 
SUM(games_played) as games_played,
SUM(goals) as goals,
SUM(points) as points,
player_name
FROM player_stats
GROUP BY player_name;

示例结果json

[
{games_played: 123, goals: 12, points: 40, player_name: "Kane, Patrick"},
{games_played: 123, goals: 12, points: 40, player_name: "P. Kane, Patrick"},
{games_played: 123, goals: 12, points: 40, player_name: "Kane, Patrick*"},
{games_played: 123, goals: 12, points: 40, player_name: "Nylander, Alex"},
{games_played: 123, goals: 12, points: 40, player_name: "A. Nylander, Alex"},
{games_played: 123, goals: 12, points: 40, player_name: "Nylander, Alex*"},
{games_played: 123, goals: 12, points: 40, player_name: "Lemieux, Mario"},
{games_played: 123, goals: 12, points: 40, player_name: "Gretzky, Wayne"},
]

问题是如何获取按玩家分组的每一列的总和,因此结果将更像以下内容:

question is how to get sums of each column grouped by like players so the result would look more like below:

[
{games_played: 369, goals: 36, points: 120, player_name: "Kane, Patrick"},
{games_played: 369, goals: 36, points: 120, player_name: "Nylander, Alex"},
{games_played: 123, goals: 12, points: 40, player_name: "Lemieux, Mario"},
{games_played: 123, goals: 12, points: 40, player_name: "Gretzky, Wayne"},
]

even better if i can get a knex.js query but i have no problem using a raw query here. DB is postgresSQL. 

thanks in advance

推荐答案

您需要执行一些操作以将名称转换为一致的形式,例如字符串替换,按句点分割并仅采用第二个Value,并删除特殊字符.字符等.没有任何人工智能可以去哦,p凯恩,帕特里克显然和帕特里克·凯恩*一样"-您必须自己进行操作.您甚至可以创建一个包含两列的表,其中每个名称的所有变体都映射到一个一致的名称,然后对不同的名称进行联接,并在一致的名称上进行分组

You'll need to do something to transform the names to a consistent form, be it string replacement, splitting on period and taking only the second Value, removing special chars etc. There isn't anything artificially intelligent that can go "oh; p kane, Patrick is clearly the same as PatrickKane*"- you'll have to do the manipulations yourself. You could even have a table with two columns with all the variations of each name, mapped to a consistent name, then do the join on the varied name and group on the consistent one

我认为我的第一步是整理数据:

I think my first step would be to sort out the data:

UPDATE player_stats 
SET player_name = REPLACE(player_name, '*', '')

UPDATE player_stats 
SET player_name = SUBSTRING(player_name from 3)
WHERE player_name LIKE '_.%'

您可以在这里停下来,然后再继续运行一次,以不断删除表中出现的垃圾,并随着更多变化的出现而添加更多规则

You could stop here and just keep re running this forever more to keep removing the garbage that arrives in the table, adding more rules as more variations arrive

但是您应该为玩家创建一张新桌子:

But you should then make a new table for the players:

SELECT uuid_generate_v4() as player_id, player_name 
INTO players
FROM (SELECT distinct player_name FROM player_stats)x

ALTER TABLE players ADD PRIMARY KEY (player_id);

然后在统计信息中添加一列以获取ID:

Then add a column to stats to take the id:

ALTER TABLE player_stats ADD player_id UUID;

将数据复制到:

UPDATE player_stats d
SET d.player_id = s.player_id
FROM players s
WHERE s.player_name = d.player_name

设置外键:

ALTER TABLE player_stats
ADD CONSTRAINT fk_playersstats_playerid__players_playerid FOREIGN KEY player_id REFERENCES players(player_id)

最后转储名称列:

ALTER TABLE player_stats DROP player_name

然后首先修复用各种垃圾填充表格的程序:)

And then go fix the program that filled the table with varying garbage in the first place :)

这篇关于原始Sql语句以相同名称的不同字符串按列分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆