非结构化文本到结构化数据 [英] Unstructured Text to Structured Data

查看:427
本文介绍了非结构化文本到结构化数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找有关以类似于Google日历快速添加按钮的方式构造非结构化文本的参考资料(教程,书籍,学术文献).

I am looking for references (tutorials, books, academic literature) concerning structuring unstructured text in a manner similar to the google calendar quick add button.

我知道这可能属于NLP类别,但我只对从"Levi jeans size 32 A0b293"之类的商品中获取商品感兴趣.

I understand this may come under the NLP category, but I am interested only in the process of going from something like "Levi jeans size 32 A0b293"

到:品牌:Levi,尺寸:32,类别:牛仔裤,代码:A0b293

to: Brand: Levi, Size: 32, Category: Jeans, code: A0b293

我想这将是词法分析和机器学习技术的某种结合.

I imagine it would be some combination of lexical parsing and machine learning techniques.

我不太了解语言,但是如果推崇,我会更喜欢python,Matlab或C ++引用

I am rather language agnostic but if pushed would prefer python, Matlab or C++ references

谢谢

推荐答案

您需要提供有关文本来源(网络?用户输入?),域(仅仅是衣服?),潜在内容的更多信息.格式和词汇...

You need to provide more information about the source of the text (the web? user input?), the domain (is it just clothes?), the potential formatting and vocabulary...

假设最坏的情况,您需要开始学习NLP. NLTK的文档是一本非常不错的免费书籍: http://www.nltk.org/book .这也是对Python的很好的介绍,而SW是免费的(用于各种用途).警告:NLP很难.它并不总是有效.有时候不好玩.最先进的技术离您想像的还远.

Assuming worst case scenario you need to start learning NLP. A very good free book is the documentation of NLTK: http://www.nltk.org/book . It is also a very good introduction to Python and the SW is free (for various usages). Be warned: NLP is hard. It doesn't always work. It is not fun at times. The state of the art is no where near where you imagine it is.

假设有一个更好的方案(您的文本是半结构化的)-一个很好的免费工具是 pyparsing .有本书,有很多示例,所产生的代码非常吸引人.

Assuming a better scenario (your text is semi-structured) - a good free tool is pyparsing. There is a book, plenty of examples and the resulting code is extremely attractive.

我希望这对您有帮助...

I hope this helps...

这篇关于非结构化文本到结构化数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆