如何构建知识图? [英] How to build a knowledge graph?

查看:116
本文介绍了如何构建知识图?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用pagerank在我的电脑上运行了一个小型搜索引擎。我有兴趣在其上创建知识图表,并且应该只返回在正确的上下文中查询的网页,类似于Google如何找到与搜索问题相关的答案。我在知识图形周围看到很多宣传,但没有很多文献,几乎没有像建立一个指南那样的伪代码。有没有人知道这些知识图是如何在内部工作的很好的参考资料,所以不需要创建关于知识图的模型?

解决方案

知识图是流行语。这是将模型和技术放在一起才能取得成果的总和。
旅途中的第一站首先是自然语言处理 Ontologies 文本挖掘。这是一个广泛的人工智能领域,去这里进行该领域的研究调查。



在建立自己的模型之前,我建议您尝试使用专用工具箱的不同标准算法,如 gensim 。您将了解tf-idf,LDA,文档特征向量等。

我假设您想使用文本数据,如果您想使用图像搜索其他图像是不同的。相同的音频部分。



构建模型只是第一步,Google知识图中最困难的部分是每天实际扩展到数十亿的请求。 。



一个好的处理管道可以很容易地建立在 Apache Spark ,当前的Hadoop。它提供了一个弹性的分布式数据存储,如果你想扩展,这是必须的。



如果你想保持你的数据为图形,就像在图形理论中一样(比如pagerank),对于直播查询,我建议您使用灯泡,它是一种框架,像图表的ORM一样,而不是SQL ,你使用Graph-traveral语言Gremlin来查询数据库。例如,您可以将后端从Neo4j切换到OpenRDF(对于本体而言很有用)。



对于图形分析,您可以使用Spark, GraphX 模块或 GraphLab



希望它有帮助。


I prototyped a tiny search engine with pagerank that worked on my computer. I am interested in building a knowledge graph on top of it, and it should return only queried webpages that are within the right context, similarly to how Google found relevant answers to search questions. I saw a lot of publicity around knowledge graph but not a lot of literature and almost no pseudocode like guideline of building one. Does anyone know good references on how such knowledge graph works internally, so there will be no need to create models about a knowledge graph?

解决方案

Knowledge graph is a buzzword. It is a sum of models and technologies put together to achieve a result. The first stop on your journey starts with Natural language processing, Ontologies and Text mining. It is a wide field of artificial intelligence, go here for a research survey on the field.

Before building your own models, I suggest you try different standard algorithms using dedicated toolboxes such as gensim. You will learn about tf-idf, LDA, document feature vectors, etc.

I am assuming you want to work with text data, if you want to do image search using other images it is different. Same for the audio part.

Building models is a only the first step, the most difficult part of Google's knowledge graph is to actually scale to billions of requests each day ...

A good processing pipeline can be built "easily" on top of Apache Spark, "the current-gen Hadoop". It provides a resilient distributed datastore which is mandatory if you want to scale.

If you want to keep your data as a graph, as in graph theory (like pagerank), for live querying, I suggest you use Bulbs which is framework which "Like an ORM for graphs, but instead of SQL, you use the graph-traveral language Gremlin to query the database". You can switch backend from Neo4j to OpenRDF (useful if you do ontologies) for instance.

For graph analytics you can use Spark, GraphX module or GraphLab.

Hope it helps.

这篇关于如何构建知识图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆