在neo4j的bulbs框架中是否有等同的提交 [英] Is there a equivalent to commit in bulbs framework for neo4j

查看:72
本文介绍了在neo4j的bulbs框架中是否有等同的提交的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在构建一个基于neo4j的数据密集型Python应用程序,出于性能原因,我需要在每个事务期间创建/恢复多个节点和关系.灯泡中是否有等效的SQLAlchemy session.commit()语句?

对于感兴趣的人,已经开发了与Bulbs的接口,该接口实现了本机运行的功能,而其他方面的功能则非常类似于SQLAlchemy: https://github.com/chefjerome/graphalchemy

解决方案

执行多部分事务的最有效方法是将事务封装在Gremlin脚本中,然后将其作为单个请求执行.

这是一个示例,该示例来自我去年为Neo4j Heroku Challenge开发的示例应用程序.

该项目称为灯泡: https://github.com/espeed/lightbulb

自述文件描述了它的作用...

什么是灯泡?

Lightbulb是适用于Heroku的,由Git驱动的,由Neo4j支持的博客引​​擎 用Python编写.

您可以在Emacs(或您最喜欢的文本编辑器)中编写博客条目. 并使用Gi​​t进行版本控制,而不会放弃 动态应用程序.

在ReStructuredText中编写博客条目,然后使用 网站的模板系统.

当您按下Heroku时,条目元数据将自动 保存到Neo4j,然后从 ReStructuredText源文件将通过磁盘提供.

但是,Neo4j放弃了在免费/测试版的Heroku Add On上提供Gremlin的功能,因此Lightbulb将不适用于Neo4j/Heroku的新用户.

第二年内-在 TinkerPop书发行之前– TinkerPop将发布Rexster Heroku Add在Gremlin的全力支持下,人们可以在阅读本书的过程中在Heroku上运行他们的项目.

但是现在,您无需担心运行该应用程序-所有相关代码都包含在这两个文件中-Lightbulb应用程序的模型文件及其Gremlin脚本文件:

https://github.com/espeed/lightbulb/blob/master/lightbulb/model.py https://github.com/espeed/lightbulb/blob/master/lightbulb /gremlin.groovy

model.py提供了用于构建自定义Bulbs模型和自定义Bulbs Graph类的示例.

gremlin.groovy包含自定义Entry模型执行的自定义Gremlin脚本-此Gremlin脚本封装了整个多部分事务,以便可以将其作为单个请求执行.

注意,在上面的model.py文件中,我通过覆盖create()update()方法来自定义了EntryProxy,而是定义了一个单一的save()方法来处理创建和更新.

要将自定义EntryProxy挂接到Entry模型中,我只需覆盖Entry模型的get_proxy_class方法,以便它返回EntryProxy类而不是默认的NodeProxy类.

Entry模型中的所有其他内容都是围绕建立save_blog_entry Gremlin脚本(在上面的gremlin.groovy文件中定义)的数据而设计的.

在gremlin.groovy中注意到,save_blog_entry()方法很长,并且包含多个闭包.您可以将每个闭包定义为一个独立的方法,并通过多个Python调用执行它们,但是这样做会产生多个服务器请求的开销,并且由于请求是分开的,因此无法将它们全部包装在一个事务中.

通过使用单个Gremlin脚本,您可以将所有内容组合到单个事务请求中.这要快得多,而且是事务性的.

您可以在Gremlin方法的最后一行中看到如何执行整个脚本:

return transaction(save_blog_entry);

在这里,我只是将事务闭包包装在内部save_blog_entry闭包中的所有命令周围.进行事务关闭可以使代码保持隔离状态,并且比将事务逻辑嵌入到其他关闭中要干净得多.

然后,如果您查看内部save_blog_entry闭包中的代码,它只是使用我在Entry模型中调用脚本时从Python传入的参数调用上面定义的其他闭包:

def _save(self, _data, kwds):
    script = self._client.scripts.get('save_blog_entry')
    params = self._get_params(_data, kwds)
    result = self._client.gremlin(script, params).one() 

我传入的参数是在模型的自定义_get_parms()方法中建立的:

def _get_params(self, _data, kwds):
    params = dict()

    # Get the property data, regardless of how it was entered
    data = build_data(_data, kwds)

    # Author
    author = data.pop('author')
    params['author_id'] = cache.get("username:%s" % author)

    # Topic Tags
    tags = (tag.strip() for tag in data.pop('tags').split(','))
    topic_bundles = []
    for topic_name in tags:
        #slug = slugify(topic_name)
        bundle = Topic(self._client).get_bundle(name=topic_name)
        topic_bundles.append(bundle)
    params['topic_bundles'] = topic_bundles


    # Entry
    # clean off any extra kwds that aren't defined as an Entry Property
    desired_keys = self.get_property_keys()
    data = extract(desired_keys, data)
    params['entry_bundle'] = self.get_bundle(data)

    return params

这是_get_params()在做什么...

buld_data(_data, kwds)是在bulbs.element中定义的功能: https://github.com/espeed/bulbs/blob/master /bulbs/element.py#L959

如果用户输入了一些作为位置args和一些作为关键字args,它只是合并了args.

我传递给_get_params()的第一个参数是author,这是作者的用户名,但是我没有将用户名传递给Gremlin脚本,而是传递了author_id. author_id已缓存,因此我使用用户名查找author_id并将其设置为参数,稍后将其传递给Gremlin save_blog_entry脚本.

然后,我为设置的每个Blog标签创建Topic Model对象,并在每个博客标签上调用get_bundle()并将其另存为参数中的topic_bundles列表.

get_bundle()方法在bulbs.model中定义: https://github.com/espeed/bulbs/blob/master /bulbs/model.py#L363

它只是返回一个包含该模型实例的dataindex_name和索引keys的元组:

def get_bundle(self, _data=None, **kwds):
    """
    Returns a tuple containing the property data, index name, and index keys.

    :param _data: Data that was passed in via a dict.
    :type _data: dict

    :param kwds: Data that was passed in via name/value pairs.
    :type kwds: dict

    :rtype: tuple

    """
    self._set_property_defaults()   
    self._set_keyword_attributes(_data, kwds)
    data = self._get_property_data()
    index_name = self.get_index_name(self._client.config)
    keys = self.get_index_keys()
    return data, index_name, keys

我在Bulbs中添加了get_bundle()方法,以提供一种很好而整洁的方式将参数捆绑在一起,这样您的Gremlin脚本就不会在签名中泛滥成灾.

最后,对于Entry,我只需创建一个entry_bundle并将其存储为参数.

请注意,_get_params()返回三个参数的dict:author_idtopic_bundleentry_bundle.

params dict直接传递给Gremlin脚本:

def _save(self, _data, kwds):
    script = self._client.scripts.get('save_blog_entry')
    params = self._get_params(_data, kwds)
    result = self._client.gremlin(script, params).one()        
    self._initialize(result)

Gremlin脚本具有与params传递的名称相同的arg名称:

def save_blog_entry(entry_bundle, author_id, topic_bundles) {

   // Gremlin code omitted for brevity 

}

然后根据需要在Gremlin脚本中简单地使用这些参数-没什么特别的.

因此,既然我已经创建了自定义模型和Gremlin脚本,我将构建一个自定义Graph对象,该对象封装所有代理和相应的模型:

class Graph(Neo4jGraph):

    def __init__(self, config=None):
        super(Graph, self).__init__(config)

        # Node Proxies
        self.people = self.build_proxy(Person)
        self.entries = self.build_proxy(Entry)
        self.topics = self.build_proxy(Topic)

        # Relationship Proxies
        self.tagged = self.build_proxy(Tagged)
        self.author = self.build_proxy(Author)

        # Add our custom Gremlin-Groovy scripts
        scripts_file = get_file_path(__file__, "gremlin.groovy")
        self.scripts.update(scripts_file)

您现在可以直接从应用程序的model.py导入Graph并像平常一样实例化Graph对象.

>> from lightbulb.model import Graph  
>> g = Graph()
>> data = dict(username='espeed',tags=['gremlin','bulbs'],docid='42',title="Test")
>> g.entries.save(data)         # execute transaction via Gremlin script

有帮助吗?

I am building a data-intensive Python application based on neo4j and for performance reasons I need to create/recover several nodes and relations during each transaction. Is there an equivalent of SQLAlchemy session.commit() statement in bulbs?

Edit:

for those interested, an interface to the Bulbs have been developped that implements that function natively and otherwise functions pretty much just like SQLAlchemy: https://github.com/chefjerome/graphalchemy

解决方案

The most performant way to execute a multi-part transaction is to encapsulate the transaction in a Gremlin script and execute it as a single request.

Here's an example of how to do it -- it's from an example app I worked up last year for the Neo4j Heroku Challenge.

The project is called Lightbulb: https://github.com/espeed/lightbulb

The README describes what it does...

What is Lightbulb?

Lightbulb is a Git-powered, Neo4j-backed blog engine for Heroku written in Python.

You get to write blog entries in Emacs (or your favorite text editor) and use Git for version control, without giving up the features of a dynamic app.

Write blog entries in ReStructuredText, and style them using your website's templating system.

When you push to Heroku, the entry metadata will be automatically saved to Neo4j, and the HTML fragment generated from the ReStructuredText source file will be served off disk.

However, Neo4j quit offering Gremlin on their free/test Heroku Add On so Lightbulb won't work for new Neo4j/Heroku users.

Within the next year -- before the TinkerPop book comes out -- TinkerPop will release a Rexster Heroku Add On with full Gremlin support so people can run their projects on Heroku as they work their way through the book.

But for right now, you don't need to concern yourself with running the app -- all the relevant code is contained within these two files -- the Lightbulb app's model file and its Gremlin script file:

https://github.com/espeed/lightbulb/blob/master/lightbulb/model.py https://github.com/espeed/lightbulb/blob/master/lightbulb/gremlin.groovy

model.py provides an example for building custom Bulbs models and a custom Bulbs Graph class.

gremlin.groovy contains a custom Gremlin script that the custom Entry model executes -- this Gremlin script encapsulates the entire multi-part transaction so that it can be executed as a single request.

Notice in the model.py file above, I customize EntryProxy by overriding the create() and update() methods and instead define a singular save() method to handle creates and updates.

To hook the custom EntryProxy into the Entry model, I simply override the Entry model's get_proxy_class method so that it returns the EntryProxy class instead of the default NodeProxy class.

Everything else in the Entry model is designed around building up the data for the save_blog_entry Gremlin script (defined in the gremlin.groovy file above).

Notice in gremlin.groovy that the save_blog_entry() method is long and contains several closures. You could define each closure as an independent method and execute them with multiple Python calls, but then you'd have the overhead of making multiple server requests and since the requests are separate, there would be no way to wrap them all in a transaction.

By using a single Gremlin script, you combine everything into a single transactional request. This is much faster, and it's transactional.

You can see how the entire script is executed in the final line of the Gremlin method:

return transaction(save_blog_entry);

Here I'm simply wrapping a transaction closure around all the commands in internal save_blog_entry closure. Making a transaction closure keeps code isolated and is much cleaner than embedding the transaction logic into the other closures.

Then if you look at the code in the internal save_blog_entry closure, it's just calling the other closures I defined above, using the params I passed in from Python when I called the script in the Entry model:

def _save(self, _data, kwds):
    script = self._client.scripts.get('save_blog_entry')
    params = self._get_params(_data, kwds)
    result = self._client.gremlin(script, params).one() 

The params I pass in are built up in the model's custom _get_parms() method:

def _get_params(self, _data, kwds):
    params = dict()

    # Get the property data, regardless of how it was entered
    data = build_data(_data, kwds)

    # Author
    author = data.pop('author')
    params['author_id'] = cache.get("username:%s" % author)

    # Topic Tags
    tags = (tag.strip() for tag in data.pop('tags').split(','))
    topic_bundles = []
    for topic_name in tags:
        #slug = slugify(topic_name)
        bundle = Topic(self._client).get_bundle(name=topic_name)
        topic_bundles.append(bundle)
    params['topic_bundles'] = topic_bundles


    # Entry
    # clean off any extra kwds that aren't defined as an Entry Property
    desired_keys = self.get_property_keys()
    data = extract(desired_keys, data)
    params['entry_bundle'] = self.get_bundle(data)

    return params

Here's what's _get_params() is doing...

buld_data(_data, kwds) is a function defined in bulbs.element: https://github.com/espeed/bulbs/blob/master/bulbs/element.py#L959

It simply merges the args in case the user entered some as positional args and some as keyword args.

The first param I pass into _get_params() is author, which is the author's username, but I don't pass the username to the Gremlin script, I pass the author_id. The author_id is cached so I use the username to look up the author_id and set that as a param, which I will later pass to the Gremlin save_blog_entry script.

Then I create Topic Model objects for each blog tag that was set, and I call get_bundle() on each and save them as a list of topic_bundles in params.

The get_bundle() method is defined in bulbs.model: https://github.com/espeed/bulbs/blob/master/bulbs/model.py#L363

It simply returns a tuple containing the data, index_name, and index keys for the model instance:

def get_bundle(self, _data=None, **kwds):
    """
    Returns a tuple containing the property data, index name, and index keys.

    :param _data: Data that was passed in via a dict.
    :type _data: dict

    :param kwds: Data that was passed in via name/value pairs.
    :type kwds: dict

    :rtype: tuple

    """
    self._set_property_defaults()   
    self._set_keyword_attributes(_data, kwds)
    data = self._get_property_data()
    index_name = self.get_index_name(self._client.config)
    keys = self.get_index_keys()
    return data, index_name, keys

I added the get_bundle() method to Bulbs to provide a nice and tidy way of bundling params together so your Gremlin script doesn't get overrun with a ton of args in its signature.

Finally, for Entry, I simply create an entry_bundle and store it as the param.

Notice that _get_params() returns a dict of three params: author_id, topic_bundle, and entry_bundle.

This params dict is passed directly to the Gremlin script:

def _save(self, _data, kwds):
    script = self._client.scripts.get('save_blog_entry')
    params = self._get_params(_data, kwds)
    result = self._client.gremlin(script, params).one()        
    self._initialize(result)

And the Gremlin script has the same arg names as those passed in by params:

def save_blog_entry(entry_bundle, author_id, topic_bundles) {

   // Gremlin code omitted for brevity 

}

The params are then simply used in the Gremlin script as needed -- nothing special going on.

So now that I've created my custom model and Gremlin script, I build a custom Graph object that encapsulates all the proxies and the respective models:

class Graph(Neo4jGraph):

    def __init__(self, config=None):
        super(Graph, self).__init__(config)

        # Node Proxies
        self.people = self.build_proxy(Person)
        self.entries = self.build_proxy(Entry)
        self.topics = self.build_proxy(Topic)

        # Relationship Proxies
        self.tagged = self.build_proxy(Tagged)
        self.author = self.build_proxy(Author)

        # Add our custom Gremlin-Groovy scripts
        scripts_file = get_file_path(__file__, "gremlin.groovy")
        self.scripts.update(scripts_file)

You can now import Graph directly from your app's model.py and instantiate the Graph object like normal.

>> from lightbulb.model import Graph  
>> g = Graph()
>> data = dict(username='espeed',tags=['gremlin','bulbs'],docid='42',title="Test")
>> g.entries.save(data)         # execute transaction via Gremlin script

Does that help?

这篇关于在neo4j的bulbs框架中是否有等同的提交的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆