使用条件随机场进行序列学习? [英] Sequence learning using Conditional Random Fields?

查看:129
本文介绍了使用条件随机场进行序列学习?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是顺序学习(&机器学习)&的新手.正在尝试了解如何使用条件随机字段来解决我的问题.

I am new to sequential learning (& machine learning) & am trying to understand how to use conditional random fields to solve my problem.

我有一个数据集,该数据集是&我的应用程序的最终用户在哪里工作.例如,以下数据集将仅具有User1的值

I have a dataset which is a sequential log of when & where did an end user of my application worked. For example, the following dataset will only have values for User1

User   Facility   Weekday
User1  FacilityA  Monday
User1  FacilityB  Tuesday
User1  FacilityC  Wednesday
 ...     ...         ...

我正在尝试解决以下问题: 给定用户工作的工作日和设施,什么设施和设备?他们是平日下班吗?

I am trying to solve the following problem: Given a weekday and facility a user worked on, what facility & weekday will they work next?

为了解决这个问题,我开始研究条件随机字段,但是对于任何一个图书馆来说,使用它都是一个艰难的时期.

To solve this problem, I started looking at Conditional Random Fields, but am having a tough time for any library to work with it..

我尝试使用以下库: 1. PyStruct( https://pystruct.github.io/) 但这由于以下问题而对我不起作用:索引超出范围:使用Pystruct适配SSVM

I tried to work with the following libraries: 1. PyStruct (https://pystruct.github.io/) But this did not work for me due to this issue: Index out of bounds: Fitting SSVM using Pystruct

  1. CRFSuite( http://www.chokkan.org/software/crfsuite/ ) (这依赖于libBFGS.当我在我的ubuntu盒子上安装libbfgs而没有任何错误时,为CRFSuite运行'make install'仍然失败,并说它无法识别libBFGS)
  1. CRFSuite (http://www.chokkan.org/software/crfsuite/) (This has dependency on libBFGS. When i install libbfgs it on my ubuntu box without any errors, running 'make install' for CRFSuite still fails and says that it is unable to recognize libBFGS)

所以我转向了另一个图书馆. 3. CRF ++( https://taku910.github.io/crfpp/)

So i turned to another library.. 3. CRF++ (https://taku910.github.io/crfpp/)

我能够安装CRF ++&也能够运行其发行版中给出的示例.但是,我需要一些帮助,以了解如何修改模板文件以适合我的用例...

I was able to install CRF++ & also am able to run the examples given in their distro. But, I need some help understanding how can i modify the template file to fit my usecase...

此外,我还认为我的标签将是上述数据集中设施+工作日的连接字符串.

Also, i was thinking my labels will be a concatenated string of facility+weekday from the above dataset.

我是序列学习和学习的新手.当前正在努力研究如何解决此问题...

I am new to sequence learning & currently trying hard to research on how to solve this problem...

任何建议都会非常有帮助,因为我似乎在这里有些困惑.

Any advice will be extremely helpful as I seem to be a bit stuck here..

谢谢!

推荐答案

  1. 是的,由于您要预测两个标签(Facility和Day),因此需要串联标签.另外,您还可以学习2种不同的模型来预测每个标签(请参见第3点).

  1. Yes, since you are trying to predict two label ( Facility and Day ), concatenating of labels will be required. Else, you can also learn 2 different models for predicting each label (see point 3).

我认为您应该研究此问题的回归模型,而不是CRF.

I think you should look into regression models for this problem rather than CRFs.

我认为数据的排列方式应使用户的日志历史记录易于学习.您能告诉我您为任何"用户拥有的最小"历史记录(最近3次登录?5次登录?7次登录?)?

I think the arrangement of the data should be in such a way that log history of a user is learned easily. Can you tell me the 'minimum' history you have for 'any' user ( last 3 logins? 5 logins? 7 logins? ) ?

假设您每个用户最近3次登录.然后,如果在您的位置,我将以不同的方式整理数据并学习2种不同的模型,一种模型用于预测天,另一种模型用于预测设施. 此处是用于预测日期的数据和模板文件的排列示例. 您类似地,将星期几的名称更改为设施名称,并学习用于预测设施的模型.您也可以考虑并向我建议的功能添加更多功能.如果您有更多的用户数据(例如职业或年龄等),则绝对应该尝试向培训数据中添加更多列,并将这些列作为特征添加到模板文件中. 请记住,测试文件的排列方式应与训练文件相同(最后一列可以为空/缺失,因为它是模型在测试期间要预测的标签).

Assuming you have last 3 logins of every user. Then, if in your place, I would arrange the data in a different manner and learn 2 different models, one to predict day and another to predict facility. An example of arrangement of data and template file for predicting day is here. You similarly, change name of days of week to facility names and learn a model for predicting facility. Also you can think of and add more features to the ones that I have suggested. If you have more user data (say occupation or age or something ) then you should definitely try adding more columns to the training data and add these columns as features in template file. Remember, the testing file should arranged in the same way as training file (except last column can be empty/missing, because it is the label that is to be predicted by the model during testing).

如果您要继续在一个模型中预测两个标签,可以尝试进行级联(在我给您的示例中,day现在变成day_facility).

If you want to go ahead and predict both label in one model, you can try concatenation (in the example that I've given you, day will now become day_facility).

这篇关于使用条件随机场进行序列学习?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆