在neo4j的灯泡框架中是否有等效的提交 [英] Is there a equivalent to commit in bulbs framework for neo4j
问题描述
我正在构建一个基于 neo4j 的数据密集型 Python 应用程序,出于性能原因,我需要在每个事务期间创建/恢复多个节点和关系.灯泡中是否有等效的 SQLAlchemy session.commit()
语句?
对于那些感兴趣的人,已经开发了一个 Bulbs 接口,该接口在本机实现该功能,其他功能与 SQLAlchemy 非常相似:https://github.com/chefjerome/graphalchemy
执行多部分事务的最高效方法是将事务封装在 Gremlin 脚本中并作为单个请求执行.
这是一个如何做到这一点的例子——它来自我去年为 Neo4j Heroku Challenge 开发的一个示例应用.
该项目名为 Lightbulb:https://github.com/espeed/lightbulb
自述文件描述了它的作用...
<块引用>什么是灯泡?
Lightbulb 是 Heroku 的基于 Git、Neo4j 支持的博客引擎用 Python 编写.
您可以在 Emacs(或您最喜欢的文本编辑器)中撰写博客条目并使用 Git 进行版本控制,同时不放弃动态应用.
在 ReStructuredText 中编写博客条目,并使用您的网站的模板系统.
当您推送到 Heroku 时,条目元数据将自动保存到 Neo4j,以及从ReStructuredText 源文件将从磁盘提供.
然而,Neo4j 停止在他们的免费/测试 Heroku Add On 上提供 Gremlin,因此 Lightbulb 不适用于新的 Neo4j/Heroku 用户.
在明年——在 TinkerPop 书籍问世之前——TinkerPop 将发布 Rexster Heroku Add在 Gremlin 的全面支持下,人们可以在阅读本书的同时在 Heroku 上运行他们的项目.
但就目前而言,您无需担心运行该应用程序——所有相关代码都包含在这两个文件中——Lightbulb 应用程序的模型文件及其 Gremlin 脚本文件:
https://github.com/espeed/lightbulb/blob/主/灯泡/模型.pyhttps://github.com/espeed/lightbulb/blob/master/lightbulb/gremlin.groovy
model.py
提供了构建自定义灯泡模型和自定义灯泡 Graph
类的示例.
gremlin.groovy
包含自定义 Entry
模型执行的自定义 Gremlin 脚本——这个 Gremlin 脚本封装了整个多部分事务,以便它可以作为单个请求.
注意在上面的 model.py
文件中,我通过覆盖 create()
和 update()<来自定义
EntryProxy
/code> 方法,而是定义一个单一的 save()
方法来处理创建和更新.
要将自定义EntryProxy
挂接到Entry
模型中,我只需覆盖Entry
模型的get_proxy_class
方法,以便它返回 EntryProxy
类而不是默认的 NodeProxy
类.
Entry
模型中的所有其他内容都是围绕为 save_blog_entry
Gremlin 脚本(在上面的 gremlin.groovy 文件中定义)构建数据而设计的.
请注意 gremlin.groovy 中的 save_blog_entry()
方法很长并且包含多个闭包.您可以将每个闭包定义为一个独立的方法并使用多个 Python 调用来执行它们,但是这样您就会有发出多个服务器请求的开销,并且由于请求是独立的,因此无法将它们全部包装在一个事务中.
通过使用单个 Gremlin 脚本,您可以将所有内容合并到一个事务请求中.这要快得多,而且是事务性的.
您可以在 Gremlin 方法的最后一行看到整个脚本是如何执行的:
返回交易(save_blog_entry);
在这里,我只是在内部 save_blog_entry
闭包中围绕所有命令包装一个事务闭包.使事务闭包保持代码隔离,并且比将事务逻辑嵌入到其他闭包中要干净得多.
然后如果你查看内部save_blog_entry
闭包中的代码,它只是调用了我上面定义的其他闭包,使用的是我在入口
型号:
def _save(self, _data, kwds):script = self._client.scripts.get('save_blog_entry')params = self._get_params(_data, kwds)结果 = self._client.gremlin(script, params).one()
我传入的参数是在模型的自定义_get_parms()
方法中构建的:
def _get_params(self, _data, kwds):参数 = dict()# 获取属性数据,不管它是如何输入的数据 = build_data(_data, kwds)# 作者作者 = data.pop('作者')params['author_id'] = cache.get("username:%s" % author)# 主题标签tags = (tag.strip() for tag in data.pop('tags').split(','))topic_bundles = []对于标签中的 topic_name:#slug = slugify(topic_name)bundle = Topic(self._client).get_bundle(name=topic_name)topic_bundles.append(捆绑)参数['topic_bundles'] = topic_bundles# 入口# 清除任何未定义为条目属性的额外 kwdrequired_keys = self.get_property_keys()数据 = 提取(desired_keys,数据)params['entry_bundle'] = self.get_bundle(data)返回参数
这是_get_params()
正在做的事情...
buld_data(_data, kwds)
是在 bulbs.element
中定义的函数:https://github.com/espeed/bulbs/blob/master/bulbs/element.py#L959
它只是合并 args,以防用户输入一些作为位置 args 和一些作为关键字 args.
我传入 _get_params()
的第一个参数是 author
,这是作者的用户名,但我没有将用户名传递给 Gremlin 脚本,我通过author_id
.author_id
已缓存,因此我使用用户名查找 author_id
并将其设置为参数,稍后我会将其传递给 Gremlin save_blog_entry
脚本.
然后我为每个设置的博客标签创建 Topic
Model
对象,并在每个标签上调用 get_bundle()
并将它们保存为params 中的 topic_bundles
列表.
get_bundle()
方法在bulls.model 中定义:https://github.com/espeed/bulbs/blob/master/bulbs/model.py#L363
它只是返回一个包含模型实例的data
、index_name
和索引keys
的元组:
def get_bundle(self, _data=None, **kwds):"""返回一个包含属性数据、索引名称和索引键的元组.:param _data: 通过字典传入的数据.:type _data: 字典:param kwds:通过名称/值对传入的数据.:type kwds: 字典:rtype: 元组"""self._set_property_defaults()self._set_keyword_attributes(_data, kwds)数据 = self._get_property_data()index_name = self.get_index_name(self._client.config)键 = self.get_index_keys()返回数据、索引名称、键
我向 Bulbs 添加了 get_bundle()
方法,以提供一种将参数捆绑在一起的漂亮而整洁的方法,这样您的 Gremlin 脚本就不会因签名中的大量参数而溢出.>
最后,对于Entry
,我只需创建一个entry_bundle
并将其存储为参数.
注意 _get_params()
返回三个参数的 dict
:author_id
、topic_bundle
和 entry_bundle
.
这个 params
dict
直接传递给 Gremlin 脚本:
def _save(self, _data, kwds):script = self._client.scripts.get('save_blog_entry')params = self._get_params(_data, kwds)结果 = self._client.gremlin(script, params).one()self._initialize(结果)
并且 Gremlin 脚本具有与 params
传入的参数名称相同的参数名称:
def save_blog_entry(entry_bundle, author_id, topic_bundles) {//为简洁起见省略了 Gremlin 代码}
然后根据需要在 Gremlin 脚本中简单地使用这些参数——没有什么特别的.
现在我已经创建了我的自定义模型和 Gremlin 脚本,我构建了一个自定义 Graph 对象来封装所有代理和相应的模型:
类图(Neo4jGraph):def __init__(self, config=None):super(Graph, self).__init__(config)# 节点代理self.people = self.build_proxy(Person)self.entries = self.build_proxy(Entry)self.topics = self.build_proxy(Topic)# 关系代理self.tagged = self.build_proxy(Tagged)self.author = self.build_proxy(作者)# 添加我们的自定义 Gremlin-Groovy 脚本scripts_file = get_file_path(__file__, "gremlin.groovy")self.scripts.update(scripts_file)
您现在可以直接从应用的 model.py
导入 Graph
并像平常一样实例化 Graph
对象.
<代码>>>从 lightbulb.model 导入图>>g = 图()>>data = dict(username='espeed',tags=['gremlin','bulbs'],docid='42',title="Test")>>g.entries.save(data) # 通过 Gremlin 脚本执行交易
这有帮助吗?
I am building a data-intensive Python application based on neo4j and for performance reasons I need to create/recover several nodes and relations during each transaction. Is there an equivalent of SQLAlchemy session.commit()
statement in bulbs?
Edit:
for those interested, an interface to the Bulbs have been developped that implements that function natively and otherwise functions pretty much just like SQLAlchemy: https://github.com/chefjerome/graphalchemy
The most performant way to execute a multi-part transaction is to encapsulate the transaction in a Gremlin script and execute it as a single request.
Here's an example of how to do it -- it's from an example app I worked up last year for the Neo4j Heroku Challenge.
The project is called Lightbulb: https://github.com/espeed/lightbulb
The README describes what it does...
What is Lightbulb?
Lightbulb is a Git-powered, Neo4j-backed blog engine for Heroku written in Python.
You get to write blog entries in Emacs (or your favorite text editor) and use Git for version control, without giving up the features of a dynamic app.
Write blog entries in ReStructuredText, and style them using your website's templating system.
When you push to Heroku, the entry metadata will be automatically saved to Neo4j, and the HTML fragment generated from the ReStructuredText source file will be served off disk.
However, Neo4j quit offering Gremlin on their free/test Heroku Add On so Lightbulb won't work for new Neo4j/Heroku users.
Within the next year -- before the TinkerPop book comes out -- TinkerPop will release a Rexster Heroku Add On with full Gremlin support so people can run their projects on Heroku as they work their way through the book.
But for right now, you don't need to concern yourself with running the app -- all the relevant code is contained within these two files -- the Lightbulb app's model file and its Gremlin script file:
https://github.com/espeed/lightbulb/blob/master/lightbulb/model.py https://github.com/espeed/lightbulb/blob/master/lightbulb/gremlin.groovy
model.py
provides an example for building custom Bulbs models and a custom Bulbs Graph
class.
gremlin.groovy
contains a custom Gremlin script that the custom Entry
model executes -- this Gremlin script encapsulates the entire multi-part transaction so that it can be executed as a single request.
Notice in the model.py
file above, I customize EntryProxy
by overriding the create()
and update()
methods and instead define a singular save()
method to handle creates and updates.
To hook the custom EntryProxy
into the Entry
model, I simply override the Entry
model's get_proxy_class
method so that it returns the EntryProxy
class instead of the default NodeProxy
class.
Everything else in the Entry
model is designed around building up the data for the save_blog_entry
Gremlin script (defined in the gremlin.groovy file above).
Notice in gremlin.groovy that the save_blog_entry()
method is long and contains several closures. You could define each closure as an independent method and execute them with multiple Python calls, but then you'd have the overhead of making multiple server requests and since the requests are separate, there would be no way to wrap them all in a transaction.
By using a single Gremlin script, you combine everything into a single transactional request. This is much faster, and it's transactional.
You can see how the entire script is executed in the final line of the Gremlin method:
return transaction(save_blog_entry);
Here I'm simply wrapping a transaction closure around all the commands in internal save_blog_entry
closure. Making a transaction closure keeps code isolated and is much cleaner than embedding the transaction logic into the other closures.
Then if you look at the code in the internal save_blog_entry
closure, it's just calling the other closures I defined above, using the params I passed in from Python when I called the script in the Entry
model:
def _save(self, _data, kwds):
script = self._client.scripts.get('save_blog_entry')
params = self._get_params(_data, kwds)
result = self._client.gremlin(script, params).one()
The params I pass in are built up in the model's custom _get_parms()
method:
def _get_params(self, _data, kwds):
params = dict()
# Get the property data, regardless of how it was entered
data = build_data(_data, kwds)
# Author
author = data.pop('author')
params['author_id'] = cache.get("username:%s" % author)
# Topic Tags
tags = (tag.strip() for tag in data.pop('tags').split(','))
topic_bundles = []
for topic_name in tags:
#slug = slugify(topic_name)
bundle = Topic(self._client).get_bundle(name=topic_name)
topic_bundles.append(bundle)
params['topic_bundles'] = topic_bundles
# Entry
# clean off any extra kwds that aren't defined as an Entry Property
desired_keys = self.get_property_keys()
data = extract(desired_keys, data)
params['entry_bundle'] = self.get_bundle(data)
return params
Here's what's _get_params()
is doing...
buld_data(_data, kwds)
is a function defined in bulbs.element
:
https://github.com/espeed/bulbs/blob/master/bulbs/element.py#L959
It simply merges the args in case the user entered some as positional args and some as keyword args.
The first param I pass into _get_params()
is author
, which is the author's username, but I don't pass the username to the Gremlin script, I pass the author_id
. The author_id
is cached so I use the username to look up the author_id
and set that as a param, which I will later pass to the Gremlin save_blog_entry
script.
Then I create Topic
Model
objects for each blog tag that was set, and I call get_bundle()
on each and save them as a list of topic_bundles
in params.
The get_bundle()
method is defined in bulbs.model:
https://github.com/espeed/bulbs/blob/master/bulbs/model.py#L363
It simply returns a tuple containing the data
, index_name
, and index keys
for the model instance:
def get_bundle(self, _data=None, **kwds):
"""
Returns a tuple containing the property data, index name, and index keys.
:param _data: Data that was passed in via a dict.
:type _data: dict
:param kwds: Data that was passed in via name/value pairs.
:type kwds: dict
:rtype: tuple
"""
self._set_property_defaults()
self._set_keyword_attributes(_data, kwds)
data = self._get_property_data()
index_name = self.get_index_name(self._client.config)
keys = self.get_index_keys()
return data, index_name, keys
I added the get_bundle()
method to Bulbs to provide a nice and tidy way of bundling params together so your Gremlin script doesn't get overrun with a ton of args in its signature.
Finally, for Entry
, I simply create an entry_bundle
and store it as the param.
Notice that _get_params()
returns a dict
of three params: author_id
, topic_bundle
, and entry_bundle
.
This params
dict
is passed directly to the Gremlin script:
def _save(self, _data, kwds):
script = self._client.scripts.get('save_blog_entry')
params = self._get_params(_data, kwds)
result = self._client.gremlin(script, params).one()
self._initialize(result)
And the Gremlin script has the same arg names as those passed in by params
:
def save_blog_entry(entry_bundle, author_id, topic_bundles) {
// Gremlin code omitted for brevity
}
The params are then simply used in the Gremlin script as needed -- nothing special going on.
So now that I've created my custom model and Gremlin script, I build a custom Graph object that encapsulates all the proxies and the respective models:
class Graph(Neo4jGraph):
def __init__(self, config=None):
super(Graph, self).__init__(config)
# Node Proxies
self.people = self.build_proxy(Person)
self.entries = self.build_proxy(Entry)
self.topics = self.build_proxy(Topic)
# Relationship Proxies
self.tagged = self.build_proxy(Tagged)
self.author = self.build_proxy(Author)
# Add our custom Gremlin-Groovy scripts
scripts_file = get_file_path(__file__, "gremlin.groovy")
self.scripts.update(scripts_file)
You can now import Graph
directly from your app's model.py
and instantiate the Graph
object like normal.
>> from lightbulb.model import Graph
>> g = Graph()
>> data = dict(username='espeed',tags=['gremlin','bulbs'],docid='42',title="Test")
>> g.entries.save(data) # execute transaction via Gremlin script
Does that help?
这篇关于在neo4j的灯泡框架中是否有等效的提交的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!