在neo4j的灯泡框架中是否有等效的提交 [英] Is there a equivalent to commit in bulbs framework for neo4j

查看:17
本文介绍了在neo4j的灯泡框架中是否有等效的提交的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在构建一个基于 neo4j 的数据密集型 Python 应用程序,出于性能原因,我需要在每个事务期间创建/恢复多个节点和关系.灯泡中是否有等效的 SQLAlchemy session.commit() 语句?

对于那些感兴趣的人,已经开发了一个 Bulbs 接口,该接口在本机实现该功能,其他功能与 SQLAlchemy 非常相似:https://github.com/chefjerome/graphalchemy

解决方案

执行多部分事务的最高效方法是将事务封装在 Gremlin 脚本中并作为单个请求执行.

这是一个如何做到这一点的例子——它来自我去年为 Neo4j Heroku Challenge 开发的一个示例应用.

该项目名为 Lightbulb:https://github.com/espeed/lightbulb

自述文件描述了它的作用...

<块引用>

什么是灯泡?

Lightbulb 是 Heroku 的基于 Git、Neo4j 支持的博客引​​擎用 Python 编写.

您可以在 Emacs(或您最喜欢的文本编辑器)中撰写博客条目并使用 Git 进行版本控制,同时不放弃动态应用.

在 ReStructuredText 中编写博客条目,并使用您的网站的模板系统.

当您推送到 Heroku 时,条目元数据将自动保存到 Neo4j,以及从ReStructuredText 源文件将从磁盘提供.

然而,Neo4j 停止在他们的免费/测试 Heroku Add On 上提供 Gremlin,因此 Lightbulb 不适用于新的 Neo4j/Heroku 用户.

在明年——在 TinkerPop 书籍问世之前——TinkerPop 将发布 Rexster Heroku Add在 Gremlin 的全面支持下,人们可以在阅读本书的同时在 Heroku 上运行他们的项目.

但就目前而言,您无需担心运行该应用程序——所有相关代码都包含在这两个文件中——Lightbulb 应用程序的模型文件及其 Gremlin 脚本文件:

https://github.com/espeed/lightbulb/blob/主/灯泡/模型.pyhttps://github.com/espeed/lightbulb/blob/master/lightbulb/gremlin.groovy

model.py 提供了构建自定义灯泡模型和自定义灯泡 Graph 类的示例.

gremlin.groovy 包含自定义 Entry 模型执行的自定义 Gremlin 脚本——这个 Gremlin 脚本封装了整个多部分事务,以便它可以作为单个请求.

注意在上面的 model.py 文件中,我通过覆盖 create()update()<来自定义 EntryProxy/code> 方法,而是定义一个单一的 save() 方法来处理创建和更新.

要将自定义EntryProxy 挂接到Entry 模型中,我只需覆盖Entry 模型的get_proxy_class 方法,以便它返回 EntryProxy 类而不是默认的 NodeProxy 类.

Entry 模型中的所有其他内容都是围绕为 save_blog_entry Gremlin 脚本(在上面的 gremlin.groovy 文件中定义)构建数据而设计的.

请注意 gremlin.groovy 中的 save_blog_entry() 方法很长并且包含多个闭包.您可以将每个闭包定义为一个独立的方法并使用多个 Python 调用来执行它们,但是这样您就会有发出多个服务器请求的开销,并且由于请求是独立的,因此无法将它们全部包装在一个事务中.

通过使用单个 Gremlin 脚本,您可以将所有内容合并到一个事务请求中.这要快得多,而且是事务性的.

您可以在 Gremlin 方法的最后一行看到整个脚本是如何执行的:

返回交易(save_blog_entry);

在这里,我只是在内部 save_blog_entry 闭包中围绕所有命令包装一个事务闭包.使事务闭包保持代码隔离,并且比将事务逻辑嵌入到其他闭包中要干净得多.

然后如果你查看内部save_blog_entry闭包中的代码,它只是调用了我上面定义的其他闭包,使用的是我在入口型号:

def _save(self, _data, kwds):script = self._client.scripts.get('save_blog_entry')params = self._get_params(_data, kwds)结果 = self._client.gremlin(script, params).one()

我传入的参数是在模型的自定义_get_parms()方法中构建的:

def _get_params(self, _data, kwds):参数 = dict()# 获取属性数据,不管它是如何输入的数据 = build_data(_data, kwds)# 作者作者 = data.pop('作者')params['author_id'] = cache.get("username:%s" % author)# 主题标签tags = (tag.strip() for tag in data.pop('tags').split(','))topic_bundles = []对于标签中的 topic_name:#slug = slugify(topic_name)bundle = Topic(self._client).get_bundle(name=topic_name)topic_bundles.append(捆绑)参数['topic_bundles'] = topic_bundles# 入口# 清除任何未定义为条目属性的额外 kwdrequired_keys = self.get_property_keys()数据 = 提取(desired_keys,数据)params['entry_bundle'] = self.get_bundle(data)返回参数

这是_get_params() 正在做的事情...

buld_data(_data, kwds) 是在 bulbs.element 中定义的函数:https://github.com/espeed/bulbs/blob/master/bulbs/element.py#L959

它只是合并 args,以防用户输入一些作为位置 args 和一些作为关键字 args.

我传入 _get_params() 的第一个参数是 author,这是作者的用户名,但我没有将用户名传递给 Gremlin 脚本,我通过author_id.author_id 已缓存,因此我使用用户名查找 author_id 并将其设置为参数,稍后我会将其传递给 Gremlin save_blog_entry 脚本.

然后我为每个设置的博客标签创建 Topic Model 对象,并在每个标签上调用 get_bundle() 并将它们保存为params 中的 topic_bundles 列表.

get_bundle() 方法在bulls.model 中定义:https://github.com/espeed/bulbs/blob/master/bulbs/model.py#L363

它只是返回一个包含模型实例的dataindex_name和索引keys的元组:

def get_bundle(self, _data=None, **kwds):"""返回一个包含属性数据、索引名称和索引键的元组.:param _data: 通过字典传入的数据.:type _data: 字典:param kwds:通过名称/值对传入的数据.:type kwds: 字典:rtype: 元组"""self._set_property_defaults()self._set_keyword_attributes(_data, kwds)数据 = self._get_property_data()index_name = self.get_index_name(self._client.config)键 = self.get_index_keys()返回数据、索引名称、键

我向 Bulbs 添加了 get_bundle() 方法,以提供一种将参数捆绑在一起的漂亮而整洁的方法,这样您的 Gremlin 脚本就不会因签名中的大量参数而溢出.

最后,对于Entry,我只需创建一个entry_bundle 并将其存储为参数.

注意 _get_params() 返回三个参数的 dict:author_idtopic_bundleentry_bundle.

这个 params dict 直接传递给 Gremlin 脚本:

def _save(self, _data, kwds):script = self._client.scripts.get('save_blog_entry')params = self._get_params(_data, kwds)结果 = self._client.gremlin(script, params).one()self._initialize(结果)

并且 Gremlin 脚本具有与 params 传入的参数名称相同的参数名称:

def save_blog_entry(entry_bundle, author_id, topic_bundles) {//为简洁起见省略了 Gremlin 代码}

然后根据需要在 Gremlin 脚本中简单地使用这些参数——没有什么特别的.

现在我已经创建了我的自定义模型和 Gremlin 脚本,我构建了一个自定义 Graph 对象来封装所有代理和相应的模型:

类图(Neo4jGraph):def __init__(self, config=None):super(Graph, self).__init__(config)# 节点代理self.people = self.build_proxy(Person)self.entries = self.build_proxy(Entry)self.topics = self.build_proxy(Topic)# 关系代理self.tagged = self.build_proxy(Tagged)self.author = self.build_proxy(作者)# 添加我们的自定义 Gremlin-Groovy 脚本scripts_file = get_file_path(__file__, "gremlin.groovy")self.scripts.update(scripts_file)

您现在可以直接从应用的 model.py 导入 Graph 并像平常一样实例化 Graph 对象.

<代码>>>从 lightbulb.model 导入图>>g = 图()>>data = dict(username='espeed',tags=['gremlin','bulbs'],docid='42',title="Test")>>g.entries.save(data) # 通过 Gremlin 脚本执行交易

这有帮助吗?

I am building a data-intensive Python application based on neo4j and for performance reasons I need to create/recover several nodes and relations during each transaction. Is there an equivalent of SQLAlchemy session.commit() statement in bulbs?

Edit:

for those interested, an interface to the Bulbs have been developped that implements that function natively and otherwise functions pretty much just like SQLAlchemy: https://github.com/chefjerome/graphalchemy

解决方案

The most performant way to execute a multi-part transaction is to encapsulate the transaction in a Gremlin script and execute it as a single request.

Here's an example of how to do it -- it's from an example app I worked up last year for the Neo4j Heroku Challenge.

The project is called Lightbulb: https://github.com/espeed/lightbulb

The README describes what it does...

What is Lightbulb?

Lightbulb is a Git-powered, Neo4j-backed blog engine for Heroku written in Python.

You get to write blog entries in Emacs (or your favorite text editor) and use Git for version control, without giving up the features of a dynamic app.

Write blog entries in ReStructuredText, and style them using your website's templating system.

When you push to Heroku, the entry metadata will be automatically saved to Neo4j, and the HTML fragment generated from the ReStructuredText source file will be served off disk.

However, Neo4j quit offering Gremlin on their free/test Heroku Add On so Lightbulb won't work for new Neo4j/Heroku users.

Within the next year -- before the TinkerPop book comes out -- TinkerPop will release a Rexster Heroku Add On with full Gremlin support so people can run their projects on Heroku as they work their way through the book.

But for right now, you don't need to concern yourself with running the app -- all the relevant code is contained within these two files -- the Lightbulb app's model file and its Gremlin script file:

https://github.com/espeed/lightbulb/blob/master/lightbulb/model.py https://github.com/espeed/lightbulb/blob/master/lightbulb/gremlin.groovy

model.py provides an example for building custom Bulbs models and a custom Bulbs Graph class.

gremlin.groovy contains a custom Gremlin script that the custom Entry model executes -- this Gremlin script encapsulates the entire multi-part transaction so that it can be executed as a single request.

Notice in the model.py file above, I customize EntryProxy by overriding the create() and update() methods and instead define a singular save() method to handle creates and updates.

To hook the custom EntryProxy into the Entry model, I simply override the Entry model's get_proxy_class method so that it returns the EntryProxy class instead of the default NodeProxy class.

Everything else in the Entry model is designed around building up the data for the save_blog_entry Gremlin script (defined in the gremlin.groovy file above).

Notice in gremlin.groovy that the save_blog_entry() method is long and contains several closures. You could define each closure as an independent method and execute them with multiple Python calls, but then you'd have the overhead of making multiple server requests and since the requests are separate, there would be no way to wrap them all in a transaction.

By using a single Gremlin script, you combine everything into a single transactional request. This is much faster, and it's transactional.

You can see how the entire script is executed in the final line of the Gremlin method:

return transaction(save_blog_entry);

Here I'm simply wrapping a transaction closure around all the commands in internal save_blog_entry closure. Making a transaction closure keeps code isolated and is much cleaner than embedding the transaction logic into the other closures.

Then if you look at the code in the internal save_blog_entry closure, it's just calling the other closures I defined above, using the params I passed in from Python when I called the script in the Entry model:

def _save(self, _data, kwds):
    script = self._client.scripts.get('save_blog_entry')
    params = self._get_params(_data, kwds)
    result = self._client.gremlin(script, params).one() 

The params I pass in are built up in the model's custom _get_parms() method:

def _get_params(self, _data, kwds):
    params = dict()

    # Get the property data, regardless of how it was entered
    data = build_data(_data, kwds)

    # Author
    author = data.pop('author')
    params['author_id'] = cache.get("username:%s" % author)

    # Topic Tags
    tags = (tag.strip() for tag in data.pop('tags').split(','))
    topic_bundles = []
    for topic_name in tags:
        #slug = slugify(topic_name)
        bundle = Topic(self._client).get_bundle(name=topic_name)
        topic_bundles.append(bundle)
    params['topic_bundles'] = topic_bundles


    # Entry
    # clean off any extra kwds that aren't defined as an Entry Property
    desired_keys = self.get_property_keys()
    data = extract(desired_keys, data)
    params['entry_bundle'] = self.get_bundle(data)

    return params

Here's what's _get_params() is doing...

buld_data(_data, kwds) is a function defined in bulbs.element: https://github.com/espeed/bulbs/blob/master/bulbs/element.py#L959

It simply merges the args in case the user entered some as positional args and some as keyword args.

The first param I pass into _get_params() is author, which is the author's username, but I don't pass the username to the Gremlin script, I pass the author_id. The author_id is cached so I use the username to look up the author_id and set that as a param, which I will later pass to the Gremlin save_blog_entry script.

Then I create Topic Model objects for each blog tag that was set, and I call get_bundle() on each and save them as a list of topic_bundles in params.

The get_bundle() method is defined in bulbs.model: https://github.com/espeed/bulbs/blob/master/bulbs/model.py#L363

It simply returns a tuple containing the data, index_name, and index keys for the model instance:

def get_bundle(self, _data=None, **kwds):
    """
    Returns a tuple containing the property data, index name, and index keys.

    :param _data: Data that was passed in via a dict.
    :type _data: dict

    :param kwds: Data that was passed in via name/value pairs.
    :type kwds: dict

    :rtype: tuple

    """
    self._set_property_defaults()   
    self._set_keyword_attributes(_data, kwds)
    data = self._get_property_data()
    index_name = self.get_index_name(self._client.config)
    keys = self.get_index_keys()
    return data, index_name, keys

I added the get_bundle() method to Bulbs to provide a nice and tidy way of bundling params together so your Gremlin script doesn't get overrun with a ton of args in its signature.

Finally, for Entry, I simply create an entry_bundle and store it as the param.

Notice that _get_params() returns a dict of three params: author_id, topic_bundle, and entry_bundle.

This params dict is passed directly to the Gremlin script:

def _save(self, _data, kwds):
    script = self._client.scripts.get('save_blog_entry')
    params = self._get_params(_data, kwds)
    result = self._client.gremlin(script, params).one()        
    self._initialize(result)

And the Gremlin script has the same arg names as those passed in by params:

def save_blog_entry(entry_bundle, author_id, topic_bundles) {

   // Gremlin code omitted for brevity 

}

The params are then simply used in the Gremlin script as needed -- nothing special going on.

So now that I've created my custom model and Gremlin script, I build a custom Graph object that encapsulates all the proxies and the respective models:

class Graph(Neo4jGraph):

    def __init__(self, config=None):
        super(Graph, self).__init__(config)

        # Node Proxies
        self.people = self.build_proxy(Person)
        self.entries = self.build_proxy(Entry)
        self.topics = self.build_proxy(Topic)

        # Relationship Proxies
        self.tagged = self.build_proxy(Tagged)
        self.author = self.build_proxy(Author)

        # Add our custom Gremlin-Groovy scripts
        scripts_file = get_file_path(__file__, "gremlin.groovy")
        self.scripts.update(scripts_file)

You can now import Graph directly from your app's model.py and instantiate the Graph object like normal.

>> from lightbulb.model import Graph  
>> g = Graph()
>> data = dict(username='espeed',tags=['gremlin','bulbs'],docid='42',title="Test")
>> g.entries.save(data)         # execute transaction via Gremlin script

Does that help?

这篇关于在neo4j的灯泡框架中是否有等效的提交的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆