如何在调查中对问题的自由格式答案进行分类和列表化? [英] How to categorize and tabularize free-form answers to a question in a survey?

查看:87
本文介绍了如何在调查中对问题的自由格式答案进行分类和列表化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想分析网络调查的答案( Git用户调查2008 感兴趣的).其中一些问题是自由形式的问题,例如您是如何得知Git的?".超过3,000个答复完全靠手工分析了这些答复是不可能的(特别是在本次调查中有很多自由格式的问题).

I want to analyze answers to a web survey (Git User's Survey 2008 if one is interested). Some of the questions were free-form questions, like "How did you hear about Git?". With more than 3,000 replies analyzing those replies entirely by hand is out of the question (especially that there is quite a bit of free-form questions in this survey).

如何将这些答复(可能基于响应中使用的关键词)至少半自动地分类到类别中(即程序可以要求确认),后来又如何制表(每个类别中的条目数)那些自由格式的答复(答案)?一个答案可以属于多个类别,尽管为简单起见,我们可以假设类别是正交的/排他的.

How can I group those replies (probably based on the key words used in response) into categories at least semi-automatically (i.e. program can ask for confirmation), and later how to tabularize (count number of entries in each category) those free-form replies (answers)? One answer can belong to more than one category, although for simplicity one can assume that categories are orthogonal / exclusive.

我想知道的是至少要搜索的关键字或要使用的算法.我希望使用 Perl (或C)中的解决方案.

What I'd like to know is at least keyword to search for, or an algorithm (a method) to use. I would prefer solutions in Perl (or C).

(2009年5月21日添加)

我考虑过的一种解决方案是对贝叶斯垃圾邮件过滤使用类似算法(以及其背后的数学方法)的东西,而不仅仅是一两个类别(垃圾邮件"和火腿"),否则会有更多解决方案;和类别本身将自适应/交互地创建.

One solution I thought about would be to use something like algorithm (and mathematical method behind it) for Bayesian spam filtering, only instead of one or two categories ("spam" and "ham") there would be more; and categories itself would be created adaptively / interactively.

推荐答案

Text :: Ngrams + Algorithm :: Cluster

  1. 使用文本为每个答案(例如字数)生成一些矢量表示形式: :Ngrams .
  2. 使用 Algorithm :: Cluster对向量进行聚类确定分组以及与这些组相对应的关键字.
  1. Generate some vector representation for each answer (e.g. word count) using Text::Ngrams.
  2. Cluster the vectors using Algorithm::Cluster to determine the groupings and also the keywords which correspond to the groups.

这篇关于如何在调查中对问题的自由格式答案进行分类和列表化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆