使用关系代数,如何在元组中找到重复的行? [英] Using Relational Algebra, how can I find duplicate rows in a tuple?

查看:351
本文介绍了使用关系代数,如何在元组中找到重复的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在完成一项家庭作业,但我确实陷入困境,已经呆了一个星期.我不是要问这个问题的答案,而是要怎么做.基本上,我需要在一个元组中查找重复项.例如,如果每个条目都是一个用户ID和一个爱好,那么我将如何查找所有至少两次出现该用户ID和爱好完全相同的条目?因此,如果我有以下元组...

I am completing a piece of homework and I'm really stuck and have been for a week. I'm not asking for the answer to the question, but just how I'd go about doing it. Basically I need to find duplicates in a single tuple. For example, if each entry was a user ID and a hobby, how would I find all entries where the user ID and hobby appear exactly the same at least two time? So if I had the following tuple...

ID | Hobby
----------
1  | Swimming

2  | Running

3  | Football

1  | Swimming

3  | Football

3  | Football

如何查找重复条目的用户的用户ID? (1和3)

How would I find the User IDs of the users with duplicate entries? (1 and 3)

推荐答案

最近我在我正在上的数据库理论课程中为我的作业分配了一个与此问题非常相似的问题.经过几分钟的思考,我有一个解决方案!我们走了..

I was recently assigned a problem very similar to this for homework in a database theory course I'm currently taking. After thinking about it for several minutes, I have a solution! Here we go..

  1. 在您的桌子上执行两个相同的投影(我将它们称为 P 1和 P 2),但限制为桌子的(唯一标识符),并认为该属性具有多次出现的相同值( attr ).在这篇文章的背景下,ID和Hobby将成为投影限制.
  2. 重命名其中一个投影的列.或者换句话说,更改ID和Hobby的名称,但也许还是类似的.对于我们的示例,我们将 P 2的列重命名为ID2和Hobby2.
  3. 关键步骤!:在 P 1& P 2.这将允许每条记录与其他每条记录配对.这就是我们想要的.我将此表称为 C . 单击此处查看图像
  4. C 上使用ID = ID2和Hobby = Hobby2的条件(针对此问题)进行选择.这将是表 S .
  5. S 上执行投影以清除重复项,这将留下一个表,该表包含成对的ID和Hobby值的唯一记录.我们称其为 P ( S ).
  6. C - P ( S )的形式应用差异运算符.当一条记录与对应项"进行比较时,这将消除这种情况,只保留真正重复的记录.
  7. 最后,在ID的限制下,对该结果表进行投影.
  1. Perform two identical projections on your table (I'll call them P1 and P2), with the restrictions being the table key (unique identifier) and the attribute that's believed to have multiple occurrences of the same value (attr). In the context of this post, ID and Hobby would be the projection restrictions.
  2. Retitle the columns for one of the projections. Or in other words, change the names of ID and Hobby, but maybe still something similar. For our example, we'll rename P2's columns to ID2 and Hobby2.
  3. Critical step!: Perform a cross product between P1 & P2. This will allow for each record to pair with every other record..which is what we want. I'll call this table C. click here for a visual
  4. Perform a selection on C with the criteria (specific to this problem) that ID = ID2 and Hobby = Hobby2. This will be table S.
  5. Perform a projection on S to clear out duplicates, which will leave a table that consists of unique records of paired ID and Hobby values. We'll call it P(S).
  6. Apply the difference operator in the fashion of C - P(S). This will take away the cases when a record is compared with its 'counterpart', leaving only records that are true duplicates.
  7. Lastly, perform a projection on this resulting table, with the restriction of ID.

这应该可以检测到任何其他种类/形式的重复项.从第4步开始,只需更改标准以适合当前问题的详细信息即可.

This should work for detecting duplicates of any other kind/form..simply change the criteria to fit the details of the problem at hand, starting at step 4, and on.

这篇关于使用关系代数,如何在元组中找到重复的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆