按类似的字符串分组 [英] Group by similar string

查看:174
本文介绍了按类似的字符串分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有这样的表格

  | id_grupo | nombre | 
|:--------- | ----------------:|
| 1 | Emprendedores 1 |
| 2 | Emprendedores 2 |
| 3 | Emprendedoras 1 |
| 4 | Emprendedoras 2 |
| 5 | Los amigos 1 |
| 6 | Los amigos 2 |
| 7 | Los amigos no 1 |

我想按名称进行分组,但数字不同,但结束 。如果仔细观察,有两个或更多单词组成的名称,但不同之处在于结尾。也有类似的名字,但他们不是像Los amigos和Los amigos no一样,这些属于不同的群体,Emprendedores和Emprendedoras也不同。

这是我的查询:

pre $ SELECT $,GROUP_CONCAT(id_grupo)
FROM creabien_sacredi_dev.grupos
GROUP BY SUBSTRING(nombre,1,5)

但大部分记录中的问题都与字符串类似,在例子中非常相似。我选择了一个包含5个字符的子字符串,但实际上名称的长度不一样,所以有些字符串不能按预期工作。



如何将这些字符串分组以下表单?

  | id_grupo | nombre | GROUP_CONCAT(id_grupo)| 
|:--------- | ----------------:| ---------------- -------:|
| 1 | Emprendedores 1 | 1,2 |
| 3 | Emprendedoras 1 | 3,4 |
| 5 | Los amigos 1 | 5,6 |
| 7 | Los amigos no 1 | 7 |

我认为关键在于前一个字符串的最后一个数字必须完全相同,但我不会不知道该怎么做。你可以帮我吗?

编辑:

'Emprendedores'最后没有任何数字,这也应该与'Emprendedores 1'和'Emprendedores 2'分组。所以我认为这个数字不再是关键,事实上我怀疑是否有办法将这些记录分组。

解决方案

以下情况如何:

  SELECT CASE 
WHEN RIGHT(nombre,1)BETWEEN'0'AND'9 'THEN
LEFT(nombre,Length(nombre) - 2)
ELSE nombre
END nombrechecked,
Group_concat(id_grupo)
FROM grupos
GROUP BY 1

以下是 SQL小提琴 显示了它的工作原理。


Suppose I have a table like this

| id_grupo |    nombre       |
|:---------|----------------:|
| 1        | Emprendedores 1 |     
| 2        | Emprendedores 2 |    
| 3        | Emprendedoras 1 |      
| 4        | Emprendedoras 2 |         
| 5        | Los amigos 1    |       
| 6        | Los amigos 2    |
| 7        | Los amigos no 1 |  

I want to group by name that are equal but ends in different number. If you look closely there are names which consists of two or more words however the difference is the ending. Also there are name which look similar but they are not the same like "Los amigos" and "Los amigos no", these belong to different groups, also "Emprendedores" and "Emprendedoras" are different.

This is the query I have:

SELECT *, GROUP_CONCAT(id_grupo) 
FROM creabien_sacredi_dev.grupos
GROUP BY SUBSTRING(nombre,1,5)

It works fine with most of the records however the problem comes with strings like in the example which are very similar. I choose a substring with 5 characters but in fact names doesn't have the same length so some strings are not working as expected.

How can I group these strings in the following form?

    | id_grupo |    nombre       | GROUP_CONCAT(id_grupo) |
    |:---------|----------------:|-----------------------:|
    | 1        | Emprendedores 1 |  1,2                   |    
    | 3        | Emprendedoras 1 |  3,4                   |   
    | 5        | Los amigos 1    |  5,6                   |
    | 7        | Los amigos no 1 |  7                     |

I think the key is on the last number the preceding string must be exactly the same, but I don't know how to do it. Could you help me please?

Edit:

There are also records like 'Emprendedores' without any number at the end and this also should be grouped with 'Emprendedores 1' and 'Emprendedores 2'. So I think the number isn't anymore the key, in fact I doubt if there exist a way to group these records.

解决方案

How about the following:

SELECT CASE 
         WHEN RIGHT(nombre, 1) BETWEEN '0' AND '9' THEN 
         LEFT(nombre, Length(nombre) - 2) 
         ELSE nombre 
       END AS nombrechecked, 
       Group_concat(id_grupo) 
FROM   grupos 
GROUP  BY 1 

Here is the SQL Fiddle that shows it works.

这篇关于按类似的字符串分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆