在MySQL中插入时检查和防止类似的字符串 [英] Checking and preventing similar strings while insertion in MySQL

查看:135
本文介绍了在MySQL中插入时检查和防止类似的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简介



我有3个表格:



设置:

  id 
名称

SetItem:

  set_id 
item_id
位置

TempSet

  id 

我有一个函数可以生成来自 Item 表的新随机组合。基本上,总是在成功生成后,我在 Set 表中创建一个新行,获取它的id并将所有项id添加到 SetItem table。
$ b

问题



每次在生成新组合之前,我都会截断 TempSet 表中,将新项目ID填入此表中,并通过与 SetItem 表中的以前组合进行比较来检查相似性百分比。如果新的组合相似度大于或等于30%,我需要防止这种组合并重新生成新的组合。

相似性意味着 - 以前生成的组合的元素的存在。所以,这个想法是:

如果新生成的集合中有超过3个元素在以前生成的集合上重复,则阻止它并尝试生成另一个组合。



以下是生成新组合的函数:

  CREATE DEFINER =`root` @ `localhost`函数`gen_uniq_perm_by_kw`(
comboSize INT(5),
尝试INT(3)
)RETURNS文本CHARSET utf8 SQL安全调用开始
iterat:
LOOP
DELETE
FROM
`TempSet`;
INSERT INTO`TempSet`(`id`)(
SELECT
`i`.`id`
FROM
`Item` as`i`
ORDER BY
RAND()
LIMIT comboSize
);
IF

SELECT
1
FROM
`SetItem`
GROUP BY
`set_id`
HAVING
sum(
CASE

当存在时(
SELECT
id
FROM
`TempSet`
WHERE
`id` =`item_id`
LIMIT 1
)THEN
1 ELSE 0
END
)/ count(1)* 100> = 30
LIMIT 1
)< 1 THEN
RETURN(SELECT GROUP_CONCAT(id SEPARATOR' - ')FROM`TempSet`);

END IF;

SET试试:=试试 - 1;
IF
try = 0 THEN
RETURN NULL;

END IF;

END LOOP iterat;

END

当我测试它时,即使新生成的组合元素没有不存在于任何其他以前生成的组合中,因此返回null。



我的问题是,我做错了什么?

解决方案

我的问题是,我做错了什么?



您的SetItem表中没有任何数据。

编辑:你评论说这是错误的;你在SetItem中有300k行。






我得到了一个工作示例。看起来你不能像你正在做的那样使用标量子查询。我以这种方式工作:

 如果存在,则删除功能gen_uniq_perm_by_kw; 
DELIMITER ;;
CREATE DEFINER =`root` @`localhost` FUNCTION`gen_uniq_perm_by_kw`(comboSize INT,try INT)返回文本CHARSET utf8 SQL安全调用
BEGIN
iterat:
LOOP
从`TempSet`中删除;

INSERT INTO`TempSet`(`id`)
SELECT`i`.`id` FROM`Item` as`i` ORDER BY RAND()LIMIT comboSize;

IF EXISTS(
SELECT set_id,
SUM(CASE WHEN EXENTS(SELECT id FROM`TempSet` WHERE`id` =`item_id` LIMIT 1)THEN 1 ELSE 0 END )AS group_sum,
COUNT(*)AS group_count
FROM`SetItem`
GROUP BY`set_id`
HAVING group_sum * 10 / group_count< 3
)THEN
RETURN(SELECT GROUP_CONCAT(id SEPARATOR' - ')FROM`TempSet`);
END IF;

SET试试=试试 - 1;

IF try = 0 THEN
RETURN NULL;
END IF;
END LOOP iterat;
END

我也以更简单的方式工作,不使用SUM和extra子查询:

 如果存在则删除功能gen_uniq_perm_by_kw; 
DELIMITER ;;
CREATE DEFINER =`root` @`localhost` FUNCTION`gen_uniq_perm_by_kw`(comboSize INT,try INT)返回文本CHARSET utf8 SQL安全调用
BEGIN
iterat:
LOOP
从`TempSet`中删除;

INSERT INTO`TempSet`(`id`)
SELECT`i`.`id` FROM`Item` as`i` ORDER BY RAND()LIMIT comboSize;

IF EXISTS(
SELECT s.set_id,
COUNT(t.id)AS group_matches,
COUNT(*)AS group_count
FROM SetItem AS s LEFT OUTER JOIN TempSet AS t ON t.id = s.item_id
GROUP BY s.set_id
HAVING group_matches * 10 / group_count< 3
)THEN
RETURN(SELECT GROUP_CONCAT(id SEPARATOR' - ')FROM`TempSet`);
END IF;

SET试试=试试 - 1;

IF try = 0 THEN
RETURN NULL;
END IF;
END LOOP iterat;
END


Brief info

I have 3 tables:

Set:

id
name

SetItem:

set_id
item_id
position

TempSet:

id

I have a function that generates new random combinations from Item table. Basically, always after successful generation, I create a new row in Set table, get it's id and add all item ids into SetItem table.

Problem

Every time before generating new combination I truncate the TempSet table, fill new item ids into this table and check for similarity percentage by comparing with previous combinations in SetItem table. if new combination similarity greater or equal to 30%, I need to prevent this combination and re-generate new combination.

Similarity means - existence of elements on previously generated combinations. So, the idea is:

if more than 3 element of newly generated set repeated on some previously generated set, prevent it and try to generate another combination.

Here is function that generates new combinations:

  CREATE DEFINER = `root` @`localhost` FUNCTION `gen_uniq_perm_by_kw` (
    comboSize INT ( 5 ),
    tries INT ( 3 ) 
    ) RETURNS text CHARSET utf8 SQL SECURITY INVOKER BEGIN
    iterat :
    LOOP
        DELETE 
        FROM
            `TempSet`;
        INSERT INTO `TempSet` ( `id` ) (
            SELECT
                `i`.`id` 
            FROM
                `Item` AS `i`
            ORDER BY
                RAND( ) 
                LIMIT comboSize 
            );
        IF
            (
            SELECT
                1 
            FROM
                `SetItem` 
            GROUP BY
                `set_id` 
            HAVING
                sum(
                CASE

                        WHEN EXISTS (
                        SELECT
                            id 
                        FROM
                            `TempSet` 
                        WHERE
                            `id` = `item_id` 
                            LIMIT 1 
                            ) THEN
                            1 ELSE 0 
                        END 
                        ) / count( 1 ) * 100 >= 30 
                        LIMIT 1 
                        ) < 1 THEN
                        RETURN ( SELECT GROUP_CONCAT( id SEPARATOR '-' ) FROM `TempSet` );

                END IF;

                SET tries := tries - 1;
                IF
                    tries = 0 THEN
                        RETURN NULL;

                END IF;

            END LOOP iterat;

END

When I test it, even when newly generated combination's elements doesn't exist in any other previously generated combination, it returns null as a result.

My question is, what am I doing wrong?

解决方案

My question is, what am I doing wrong?

You don't have any data in your SetItem table.

Edit: You commented that this is wrong; you do have 300k rows in SetItem.


I got an example working. It appears that you can't use a scalar subquery like you're doing. I got it working this way:

DROP FUNCTION IF EXISTS gen_uniq_perm_by_kw;
DELIMITER ;;
CREATE DEFINER = `root` @`localhost` FUNCTION `gen_uniq_perm_by_kw` (comboSize INT, tries INT) RETURNS text CHARSET utf8 SQL SECURITY INVOKER
BEGIN
        iterat :
        LOOP
                DELETE FROM `TempSet`;

                INSERT INTO `TempSet` (`id`)
                SELECT `i`.`id` FROM `Item` AS `i` ORDER BY RAND() LIMIT comboSize;

                IF EXISTS(
                        SELECT set_id,
                                SUM(CASE WHEN EXISTS (SELECT id FROM `TempSet` WHERE `id` = `item_id` LIMIT 1) THEN 1 ELSE 0 END) AS group_sum,
                                COUNT(*) AS group_count
                        FROM `SetItem`
                        GROUP BY `set_id`
                        HAVING group_sum * 10 / group_count < 3
                ) THEN
                        RETURN (SELECT GROUP_CONCAT(id SEPARATOR '-') FROM `TempSet`);
                END IF;

                SET tries = tries - 1;

                IF tries = 0 THEN
                        RETURN NULL;
                END IF;
        END LOOP iterat;
END

I also got it working in a simpler way, without using the SUM and extra subquery:

DROP FUNCTION IF EXISTS gen_uniq_perm_by_kw;
DELIMITER ;;
CREATE DEFINER = `root` @`localhost` FUNCTION `gen_uniq_perm_by_kw` (comboSize INT, tries INT) RETURNS text CHARSET utf8 SQL SECURITY INVOKER
BEGIN
        iterat :
        LOOP
                DELETE FROM `TempSet`;

                INSERT INTO `TempSet` (`id`)
                SELECT `i`.`id` FROM `Item` AS `i` ORDER BY RAND() LIMIT comboSize;

                IF EXISTS(
                        SELECT s.set_id,
                                COUNT(t.id) AS group_matches,
                                COUNT(*) AS group_count
                        FROM SetItem AS s LEFT OUTER JOIN TempSet AS t ON t.id = s.item_id
                        GROUP BY s.set_id
                        HAVING group_matches * 10 / group_count < 3
                ) THEN
                        RETURN (SELECT GROUP_CONCAT(id SEPARATOR '-') FROM `TempSet`);
                END IF;

                SET tries = tries - 1;

                IF tries = 0 THEN
                        RETURN NULL;
                END IF;
        END LOOP iterat;
END

这篇关于在MySQL中插入时检查和防止类似的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆