我该如何编写一个函数来比较和排列许多布尔(真/假)答案? [英] How do I write a function to compare and rank many sets of boolean (true/false) answers?

查看:42
本文介绍了我该如何编写一个函数来比较和排列许多布尔(真/假)答案?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经开始了一个项目,事实证明它比我最初想象的要复杂得多.我正在尝试计划一个基于布尔(真/假)问题和答案的系统.系统上的用户可以回答来自大量布尔(真/假)问题的任何问题,并根据他们的答案显示一个列表,该列表显示出最相似的用户(按相似性顺序).

I've embarked on a project that is proving considerably more complicated than I'd first imagined. I'm trying to plan a system that is based around boolean (true/false) questions and answers. Users on the system can answer any questions from a large set of boolean (true/false) questions and be presented with a list showing the most similar users (in order of similarity) based on their answers.

我已经广泛搜索Google,但仍未提出太多建议,因此我希望有人可以指出正确的方向.我想知道:

I've Googled far and wide but still not come up with much, so I was hoping somebody could point me in the right direction. I'd like to know:

存储此类数据的最佳数据结构和方法是什么?我本来以为可以在SQL数据库中创建两个表问题"和答案".但是,我想知道如果将两组答案都列为数字字符串,比较两组答案是否会更简单?IE.0 =未回答,1 =正确,2 =否.比较字符串时,可以为未回答" = 0,相同答案" = 1,相反答案" = -1产生权重,从而产生相似度得分.

What is the best data structure and method to store this kind of data? I'd originally assumed I could create two tables "questions" and "answers" in an SQL database. However, I'm not wondering if it would be simpler to compare two sets of answers if they were both listed as numerical string. I.e. 0 = not answered, 1 = true, 2 = false. When comparing the strings weights could be added for "not answered" = 0, "same answer" = 1, "opposite answer" = -1 producing a similarity score.

我将如何比较两组答案?为了能够计算出这两组答案之间的相似性",我将不得不编写一个比较函数.有谁知道哪种比较最适合这个问题?我研究了序列对齐,我认为这可能是正确的选择,但我不确定,因为这需要数据按较长的顺序排列,再加上问题不相关,因此也不自然是按顺序排列的.

How would I go about comparing two sets of answers? To be able to work out the "similarity" between these sets of answers I'm going to have to write a comparison function. Does anyone know what kind of comparison would best suite this problem? I've looked into sequence alignment and I think this could be the correct way to go but I'm unsure as this requires the data to be in a long sequence, plus the questions aren't related so aren't naturally a sequence.

如何将比较功能应用于大量数据?编写比较功能后,我可以将每个用户的答案与其他用户的答案进行比较,但这并不能似乎非常有效,并且可能无法很好地扩展.我一直在研究集群分析方法,以根据相似的答案自动对用户进行分组,您呢?认为这可能有效,或者有人知道我可以研究的更好方法吗?

How do I apply this comparison function to a large set of data? Once I've written the comparison function I could just compare each users answers to every other user's answers, however this doesn't seem very efficient and probably wouldn't scale very well. I've been looking into cluster analysis methods to automatically group users according to similar answers, do you think this could work or does anyone know a better method I could look into?

我非常感谢任何有用的指示.谢谢!

I'd really appreciate any helpful pointers. Thanks!

推荐答案

如果要在SQL中使用针对用户,问题和答案的表进行设置,那么我相信可以使用以下SQL来获取其他用户类似的回应.只需添加一个TOP子句即可获取所需的号码.

If you were to set it up in SQL with tables for Users, Questions, and Answers then I believe that the following SQL could be used to get other users with similar responses. Simply add a TOP clause to get the number that you want.

我不知道性能如何,但这在很大程度上取决于数据的大小.

I don't know how performance will be, but that would depend a lot on the size of your data.

SELECT
    U2.userid,
    SUM(CASE
            WHEN A1.answer = A2.answer THEN 1
            WHEN A1.answer <> A2.answer THEN -1
            WHEN A1.answer IS NULL OR A2.answer IS NULL THEN 0  -- A bit redundant, but I like to make it clear
            ELSE 0
        END) AS similarity_score
FROM
    Questions Q
LEFT OUTER JOIN Answers A1 ON
    A1.question_id = Q.question_id AND
    A1.userid = @userid
LEFT OUTER JOIN Answers A2 ON
    A2.question_id = A1.question_id AND
    A2.userid <> A1.userid
LEFT OUTER JOIN Users U2 ON
    U2.userid = A2.userid
GROUP BY
    U2.userid
ORDER BY
    similarity_score DESC

这篇关于我该如何编写一个函数来比较和排列许多布尔(真/假)答案?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆