是否可以读取数据集中的每个记录,计算具有许多(但不是全部)相似属性的所有记录,然后全部显示那些相似的属性? [英] Is it possible to read through each record in a dataset, to count all records with many (but not all) similar attributes and then all show what are those similar attributes ?

查看:50
本文介绍了是否可以读取数据集中的每个记录,计算具有许多(但不是全部)相似属性的所有记录,然后全部显示那些相似的属性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道在SQL中,您需要指定要选择的属性,并找出这些属性是否相似,就像您可以计算列a,b和c中具有相同值的所有记录一样。



我要问的是,有可能搜索所有记录以查看哪些记录具有相似的属性值,并将它们分组为组1(将它们计为类似n结果发现)然后显示那些列/字段有什么相似的值?希望你能解决我的疑问。



我的数据集主要由文本数据和时间组成。



最像这样的是:

http:// img14。 imageshack.us/img14/6015/84e2.png [ ^ ]



谢谢高级。

解决方案

你的意思是你想要的查找具有两个或多个相同属性的行(组)数量?如下所示?



   -   要演示的一些虚拟数据。 
创建 test

a varchar 10 null
b varchar 10 null
c varchar 10 null
d varchar 10 null
e varchar 10 null
f varchar 10 )< span class =code-keyword> null



insert 进入测试(a,b,c,d,e,f)' fred'' jim'' sheila'' wibble'' wobble'' womble'
insert 进入测试(a,b,c,d,e,f)' fred'' ethel',< span class =code-string>' sheila'' wibble'' 摆动'' womble'
insert into test(a,b,c,d,e,f) values ' fred'' jim'' sheila'' wibble'' wobble'' womble'
插入 进入测试(a,b,c,d,e,f) values ' fred'' jim'' sheila'' wibble '' wobble'' womble'
insert into test(a,b,c,d,e,f)' < span class =code-string> fred',' ethel'' sheila'' wibble'' wobble'' womble'
插入 进入测试(a,b,c,d,e,f)' fred' ' jim'' sheila'' wibble'' wubble'' womble'
插入 进入测试(a,b,c,d,e,f) values ' fred'' albert'' sheila'' wibble'' wubble'' womble'
insert into test(a,b,c,d,e,f) values ' fred'' jim'' sheila'' wibble'' wobble',< span class =code-string>' womble'

- - 计算具有相同a,b,c,d,e,f
选择
count(*) as groupCount,
a,b,c,d,e,f
来自 test
group by
a,b,c,d,e,f

groupCount abcdef
- --------- --------- - ---------- ---------- ---------- ---------- --------- -
1 fred albert sheila wibble wubble womble
2 fred ethel sheila wibble wobble womble
4 fred jim sheila wibble wobble womble
1 fred jim sheila wibble wubble womble

4 行受影响)

选择
c ount(*) as groupCount,
b,c,d
来自 test
group by
b,c,d

$ groupCount bcd
- --------- ---------- ---------- ----------
1 albert sheila wibble
2 ethel sheila wibble
5 jim sheila wibble

(< span class =code-digit> 3 行受影响)


嗯,你的问题很模糊,但任何方式都被解释,答案是肯定的。



我相信你想向谷歌询问地图缩减,并在阅读每一条记录时将其应用于你的数据。 / BLOCKQUOTE>

I know in SQL, you need to specify what are the attributes you want to select and find out if those attributes are similar, like you can count all records that have same values in column a, b and c.

What I'm asking is ,it possible to search all records to see what records have similar attributes values and sort of like group them as group 1 (count them like n results found) then show what are those column/fields they have similar values with? Hope you get my point in my crazy question.

My dataset is composed mostly of textual data and time.

Most like this is what is looks like:
http://img14.imageshack.us/img14/6015/84e2.png[^]

Thanks in advanced.

解决方案

Do you mean that you want to find the number of rows (groups) where you have two or more attributes that are identical? As below?

-- Some dummy data to demonstrate.
create table test
(
 a varchar(10) null,
 b varchar(10) null,
 c varchar(10) null,
 d varchar(10) null,
 e varchar(10) null,
 f varchar(10) null
)


insert into test (a,b,c,d,e,f) values ('fred','jim','sheila','wibble','wobble','womble')
insert into test (a,b,c,d,e,f) values ('fred','ethel','sheila','wibble','wobble','womble')
insert into test (a,b,c,d,e,f) values ('fred','jim','sheila','wibble','wobble','womble')
insert into test (a,b,c,d,e,f) values ('fred','jim','sheila','wibble','wobble','womble')
insert into test (a,b,c,d,e,f) values ('fred','ethel','sheila','wibble','wobble','womble')
insert into test (a,b,c,d,e,f) values ('fred','jim','sheila','wibble','wubble','womble')
insert into test (a,b,c,d,e,f) values ('fred','albert','sheila','wibble','wubble','womble')
insert into test (a,b,c,d,e,f) values ('fred','jim','sheila','wibble','wobble','womble')

-- Count the number of rows having identical values of a,b,c,d,e,f
select 
    count(*) as groupCount,
    a,b,c,d,e,f
from test
group by 
  a,b,c,d,e,f

groupCount  a          b          c          d          e          f
----------- ---------- ---------- ---------- ---------- ---------- ----------
1           fred       albert     sheila     wibble     wubble     womble
2           fred       ethel      sheila     wibble     wobble     womble
4           fred       jim        sheila     wibble     wobble     womble
1           fred       jim        sheila     wibble     wubble     womble

(4 row(s) affected)

select 
    count(*) as groupCount,
    b,c,d
from test
group by 
   b,c,d

groupCount  b          c          d
----------- ---------- ---------- ----------
1           albert     sheila     wibble
2           ethel      sheila     wibble
5           jim        sheila     wibble

(3 row(s) affected)


Well, your question is vague but any way it is interpreted, the answer is "yes".

I believe you want to ask Google about "Map reduce" and apply it to your data while reading each of the records.


这篇关于是否可以读取数据集中的每个记录,计算具有许多(但不是全部)相似属性的所有记录,然后全部显示那些相似的属性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆