在neo4j的bulbs框架中是否有等同的提交 [英] Is there a equivalent to commit in bulbs framework for neo4j
问题描述
我正在构建一个基于neo4j的数据密集型Python应用程序,出于性能原因,我需要在每个事务期间创建/恢复多个节点和关系.灯泡中是否有等效的SQLAlchemy session.commit()
语句?
对于感兴趣的人,已经开发了与Bulbs的接口,该接口实现了本机运行的功能,而其他方面的功能则非常类似于SQLAlchemy: https://github.com/chefjerome/graphalchemy
执行多部分事务的最有效方法是将事务封装在Gremlin脚本中,然后将其作为单个请求执行.
这是一个示例,该示例来自我去年为Neo4j Heroku Challenge开发的示例应用程序.
该项目称为灯泡: https://github.com/espeed/lightbulb
自述文件描述了它的作用...
什么是灯泡?
Lightbulb是适用于Heroku的,由Git驱动的,由Neo4j支持的博客引擎 用Python编写.
您可以在Emacs(或您最喜欢的文本编辑器)中编写博客条目. 并使用Git进行版本控制,而不会放弃 动态应用程序.
在ReStructuredText中编写博客条目,然后使用 网站的模板系统.
当您按下Heroku时,条目元数据将自动 保存到Neo4j,然后从 ReStructuredText源文件将通过磁盘提供.
但是,Neo4j放弃了在免费/测试版的Heroku Add On上提供Gremlin的功能,因此Lightbulb将不适用于Neo4j/Heroku的新用户.
第二年内-在 TinkerPop书发行之前– TinkerPop将发布Rexster Heroku Add在Gremlin的全力支持下,人们可以在阅读本书的过程中在Heroku上运行他们的项目.
但是现在,您无需担心运行该应用程序-所有相关代码都包含在这两个文件中-Lightbulb应用程序的模型文件及其Gremlin脚本文件:
https://github.com/espeed/lightbulb/blob/master/lightbulb/model.py https://github.com/espeed/lightbulb/blob/master/lightbulb /gremlin.groovy
model.py
提供了用于构建自定义Bulbs模型和自定义Bulbs Graph
类的示例.
gremlin.groovy
包含自定义Entry
模型执行的自定义Gremlin脚本-此Gremlin脚本封装了整个多部分事务,以便可以将其作为单个请求执行.
注意,在上面的model.py
文件中,我通过覆盖create()
和update()
方法来自定义了EntryProxy
,而是定义了一个单一的save()
方法来处理创建和更新.
要将自定义EntryProxy
挂接到Entry
模型中,我只需覆盖Entry
模型的get_proxy_class
方法,以便它返回EntryProxy
类而不是默认的NodeProxy
类.
Entry
模型中的所有其他内容都是围绕建立save_blog_entry
Gremlin脚本(在上面的gremlin.groovy文件中定义)的数据而设计的.
在gremlin.groovy中注意到,save_blog_entry()
方法很长,并且包含多个闭包.您可以将每个闭包定义为一个独立的方法,并通过多个Python调用执行它们,但是这样做会产生多个服务器请求的开销,并且由于请求是分开的,因此无法将它们全部包装在一个事务中.
通过使用单个Gremlin脚本,您可以将所有内容组合到单个事务请求中.这要快得多,而且是事务性的.
您可以在Gremlin方法的最后一行中看到如何执行整个脚本:
return transaction(save_blog_entry);
在这里,我只是将事务闭包包装在内部save_blog_entry
闭包中的所有命令周围.进行事务关闭可以使代码保持隔离状态,并且比将事务逻辑嵌入到其他关闭中要干净得多.
然后,如果您查看内部save_blog_entry
闭包中的代码,它只是使用我在Entry
模型中调用脚本时从Python传入的参数调用上面定义的其他闭包:>
def _save(self, _data, kwds):
script = self._client.scripts.get('save_blog_entry')
params = self._get_params(_data, kwds)
result = self._client.gremlin(script, params).one()
我传入的参数是在模型的自定义_get_parms()
方法中建立的:
def _get_params(self, _data, kwds):
params = dict()
# Get the property data, regardless of how it was entered
data = build_data(_data, kwds)
# Author
author = data.pop('author')
params['author_id'] = cache.get("username:%s" % author)
# Topic Tags
tags = (tag.strip() for tag in data.pop('tags').split(','))
topic_bundles = []
for topic_name in tags:
#slug = slugify(topic_name)
bundle = Topic(self._client).get_bundle(name=topic_name)
topic_bundles.append(bundle)
params['topic_bundles'] = topic_bundles
# Entry
# clean off any extra kwds that aren't defined as an Entry Property
desired_keys = self.get_property_keys()
data = extract(desired_keys, data)
params['entry_bundle'] = self.get_bundle(data)
return params
这是_get_params()
在做什么...
buld_data(_data, kwds)
是在bulbs.element
中定义的功能:
https://github.com/espeed/bulbs/blob/master /bulbs/element.py#L959
如果用户输入了一些作为位置args和一些作为关键字args,它只是合并了args.
我传递给_get_params()
的第一个参数是author
,这是作者的用户名,但是我没有将用户名传递给Gremlin脚本,而是传递了author_id
. author_id
已缓存,因此我使用用户名查找author_id
并将其设置为参数,稍后将其传递给Gremlin save_blog_entry
脚本.
然后,我为设置的每个Blog标签创建Topic
Model
对象,并在每个博客标签上调用get_bundle()
并将其另存为参数中的topic_bundles
列表.
get_bundle()
方法在bulbs.model中定义:
https://github.com/espeed/bulbs/blob/master /bulbs/model.py#L363
它只是返回一个包含该模型实例的data
,index_name
和索引keys
的元组:
def get_bundle(self, _data=None, **kwds):
"""
Returns a tuple containing the property data, index name, and index keys.
:param _data: Data that was passed in via a dict.
:type _data: dict
:param kwds: Data that was passed in via name/value pairs.
:type kwds: dict
:rtype: tuple
"""
self._set_property_defaults()
self._set_keyword_attributes(_data, kwds)
data = self._get_property_data()
index_name = self.get_index_name(self._client.config)
keys = self.get_index_keys()
return data, index_name, keys
我在Bulbs中添加了get_bundle()
方法,以提供一种很好而整洁的方式将参数捆绑在一起,这样您的Gremlin脚本就不会在签名中泛滥成灾.
最后,对于Entry
,我只需创建一个entry_bundle
并将其存储为参数.
请注意,_get_params()
返回三个参数的dict
:author_id
,topic_bundle
和entry_bundle
.
此params
dict
直接传递给Gremlin脚本:
def _save(self, _data, kwds):
script = self._client.scripts.get('save_blog_entry')
params = self._get_params(_data, kwds)
result = self._client.gremlin(script, params).one()
self._initialize(result)
Gremlin脚本具有与params
传递的名称相同的arg名称:
def save_blog_entry(entry_bundle, author_id, topic_bundles) {
// Gremlin code omitted for brevity
}
然后根据需要在Gremlin脚本中简单地使用这些参数-没什么特别的.
因此,既然我已经创建了自定义模型和Gremlin脚本,我将构建一个自定义Graph对象,该对象封装所有代理和相应的模型:
class Graph(Neo4jGraph):
def __init__(self, config=None):
super(Graph, self).__init__(config)
# Node Proxies
self.people = self.build_proxy(Person)
self.entries = self.build_proxy(Entry)
self.topics = self.build_proxy(Topic)
# Relationship Proxies
self.tagged = self.build_proxy(Tagged)
self.author = self.build_proxy(Author)
# Add our custom Gremlin-Groovy scripts
scripts_file = get_file_path(__file__, "gremlin.groovy")
self.scripts.update(scripts_file)
您现在可以直接从应用程序的model.py
导入Graph
并像平常一样实例化Graph
对象.
>> from lightbulb.model import Graph
>> g = Graph()
>> data = dict(username='espeed',tags=['gremlin','bulbs'],docid='42',title="Test")
>> g.entries.save(data) # execute transaction via Gremlin script
有帮助吗?
I am building a data-intensive Python application based on neo4j and for performance reasons I need to create/recover several nodes and relations during each transaction. Is there an equivalent of SQLAlchemy session.commit()
statement in bulbs?
Edit:
for those interested, an interface to the Bulbs have been developped that implements that function natively and otherwise functions pretty much just like SQLAlchemy: https://github.com/chefjerome/graphalchemy
The most performant way to execute a multi-part transaction is to encapsulate the transaction in a Gremlin script and execute it as a single request.
Here's an example of how to do it -- it's from an example app I worked up last year for the Neo4j Heroku Challenge.
The project is called Lightbulb: https://github.com/espeed/lightbulb
The README describes what it does...
What is Lightbulb?
Lightbulb is a Git-powered, Neo4j-backed blog engine for Heroku written in Python.
You get to write blog entries in Emacs (or your favorite text editor) and use Git for version control, without giving up the features of a dynamic app.
Write blog entries in ReStructuredText, and style them using your website's templating system.
When you push to Heroku, the entry metadata will be automatically saved to Neo4j, and the HTML fragment generated from the ReStructuredText source file will be served off disk.
However, Neo4j quit offering Gremlin on their free/test Heroku Add On so Lightbulb won't work for new Neo4j/Heroku users.
Within the next year -- before the TinkerPop book comes out -- TinkerPop will release a Rexster Heroku Add On with full Gremlin support so people can run their projects on Heroku as they work their way through the book.
But for right now, you don't need to concern yourself with running the app -- all the relevant code is contained within these two files -- the Lightbulb app's model file and its Gremlin script file:
https://github.com/espeed/lightbulb/blob/master/lightbulb/model.py https://github.com/espeed/lightbulb/blob/master/lightbulb/gremlin.groovy
model.py
provides an example for building custom Bulbs models and a custom Bulbs Graph
class.
gremlin.groovy
contains a custom Gremlin script that the custom Entry
model executes -- this Gremlin script encapsulates the entire multi-part transaction so that it can be executed as a single request.
Notice in the model.py
file above, I customize EntryProxy
by overriding the create()
and update()
methods and instead define a singular save()
method to handle creates and updates.
To hook the custom EntryProxy
into the Entry
model, I simply override the Entry
model's get_proxy_class
method so that it returns the EntryProxy
class instead of the default NodeProxy
class.
Everything else in the Entry
model is designed around building up the data for the save_blog_entry
Gremlin script (defined in the gremlin.groovy file above).
Notice in gremlin.groovy that the save_blog_entry()
method is long and contains several closures. You could define each closure as an independent method and execute them with multiple Python calls, but then you'd have the overhead of making multiple server requests and since the requests are separate, there would be no way to wrap them all in a transaction.
By using a single Gremlin script, you combine everything into a single transactional request. This is much faster, and it's transactional.
You can see how the entire script is executed in the final line of the Gremlin method:
return transaction(save_blog_entry);
Here I'm simply wrapping a transaction closure around all the commands in internal save_blog_entry
closure. Making a transaction closure keeps code isolated and is much cleaner than embedding the transaction logic into the other closures.
Then if you look at the code in the internal save_blog_entry
closure, it's just calling the other closures I defined above, using the params I passed in from Python when I called the script in the Entry
model:
def _save(self, _data, kwds):
script = self._client.scripts.get('save_blog_entry')
params = self._get_params(_data, kwds)
result = self._client.gremlin(script, params).one()
The params I pass in are built up in the model's custom _get_parms()
method:
def _get_params(self, _data, kwds):
params = dict()
# Get the property data, regardless of how it was entered
data = build_data(_data, kwds)
# Author
author = data.pop('author')
params['author_id'] = cache.get("username:%s" % author)
# Topic Tags
tags = (tag.strip() for tag in data.pop('tags').split(','))
topic_bundles = []
for topic_name in tags:
#slug = slugify(topic_name)
bundle = Topic(self._client).get_bundle(name=topic_name)
topic_bundles.append(bundle)
params['topic_bundles'] = topic_bundles
# Entry
# clean off any extra kwds that aren't defined as an Entry Property
desired_keys = self.get_property_keys()
data = extract(desired_keys, data)
params['entry_bundle'] = self.get_bundle(data)
return params
Here's what's _get_params()
is doing...
buld_data(_data, kwds)
is a function defined in bulbs.element
:
https://github.com/espeed/bulbs/blob/master/bulbs/element.py#L959
It simply merges the args in case the user entered some as positional args and some as keyword args.
The first param I pass into _get_params()
is author
, which is the author's username, but I don't pass the username to the Gremlin script, I pass the author_id
. The author_id
is cached so I use the username to look up the author_id
and set that as a param, which I will later pass to the Gremlin save_blog_entry
script.
Then I create Topic
Model
objects for each blog tag that was set, and I call get_bundle()
on each and save them as a list of topic_bundles
in params.
The get_bundle()
method is defined in bulbs.model:
https://github.com/espeed/bulbs/blob/master/bulbs/model.py#L363
It simply returns a tuple containing the data
, index_name
, and index keys
for the model instance:
def get_bundle(self, _data=None, **kwds):
"""
Returns a tuple containing the property data, index name, and index keys.
:param _data: Data that was passed in via a dict.
:type _data: dict
:param kwds: Data that was passed in via name/value pairs.
:type kwds: dict
:rtype: tuple
"""
self._set_property_defaults()
self._set_keyword_attributes(_data, kwds)
data = self._get_property_data()
index_name = self.get_index_name(self._client.config)
keys = self.get_index_keys()
return data, index_name, keys
I added the get_bundle()
method to Bulbs to provide a nice and tidy way of bundling params together so your Gremlin script doesn't get overrun with a ton of args in its signature.
Finally, for Entry
, I simply create an entry_bundle
and store it as the param.
Notice that _get_params()
returns a dict
of three params: author_id
, topic_bundle
, and entry_bundle
.
This params
dict
is passed directly to the Gremlin script:
def _save(self, _data, kwds):
script = self._client.scripts.get('save_blog_entry')
params = self._get_params(_data, kwds)
result = self._client.gremlin(script, params).one()
self._initialize(result)
And the Gremlin script has the same arg names as those passed in by params
:
def save_blog_entry(entry_bundle, author_id, topic_bundles) {
// Gremlin code omitted for brevity
}
The params are then simply used in the Gremlin script as needed -- nothing special going on.
So now that I've created my custom model and Gremlin script, I build a custom Graph object that encapsulates all the proxies and the respective models:
class Graph(Neo4jGraph):
def __init__(self, config=None):
super(Graph, self).__init__(config)
# Node Proxies
self.people = self.build_proxy(Person)
self.entries = self.build_proxy(Entry)
self.topics = self.build_proxy(Topic)
# Relationship Proxies
self.tagged = self.build_proxy(Tagged)
self.author = self.build_proxy(Author)
# Add our custom Gremlin-Groovy scripts
scripts_file = get_file_path(__file__, "gremlin.groovy")
self.scripts.update(scripts_file)
You can now import Graph
directly from your app's model.py
and instantiate the Graph
object like normal.
>> from lightbulb.model import Graph
>> g = Graph()
>> data = dict(username='espeed',tags=['gremlin','bulbs'],docid='42',title="Test")
>> g.entries.save(data) # execute transaction via Gremlin script
Does that help?
这篇关于在neo4j的bulbs框架中是否有等同的提交的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!