Django数据分片 [英] Django data sharding

查看:250
本文介绍了Django数据分片的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用基于模型的路由方案成功地使我的应用程序在多个数据库上运行。即模型A生活在DB A上,模型B生活在DB B上。我现在需要分析我的数据。我正在查看文档,并且在多个数据库服务器上需要存在相同的模型时,无法解决如何执行此操作。我想要一个标志来表示数据库,新成员现在是数据库X,成员X-Y存在于数据库N等。



我该怎么做?是否使用**提示,这似乎不足以证明我。

解决方案

提示参数旨在帮助您的数据库路由器决定读取或写入数据的位置。它可能随着python的未来版本而发展,但是现在只有Django框架可以提供一种提示,这就是它正在处理的实例。 p>

我写了这个非常简单的数据库路由器来查看Django的作用:

  #routes.py 
import logging
logger = logging.getLogger(my_project)

class DebugRouter(object):
调试路由器

def db_for_read(self,model,** hints):
logger.debug(db_for_read%s%repr((model,hints)))
return None

def db_for_write(self,model,** hints):
logger.debug(db_for_write%s%repr((model,hints)))
return无

def allow_relation(self,obj1,obj2,** hints):
logger.debug(allow_relation%s%repr((obj1,obj2,hints)))
return none

def allow_syncdb(self,db,model):
logger.debug(allow_syncdb% s%repr((db,model)))
return无

你声明这个在 settings.py 中:

  DATABASE_ROUTERS = [my_project.routers。 DebugRouter] 

确保日志记录已正确配置,以输出调试输出(例如stderr):

  LOGGING = {
'version':1,
'disable_existing_loggers':False,
'处理程序':{
[...其他处理程序...]
'stderr':{
'level':'DEBUG',
'class' log.StreamHandler'
}
},
'loggers':{
[...其他一些记录器]
'my_project':{
'handlers':['stderr'],
'level':'DEBUG',
'propagate':True,
},
}
}

然后你可以打开一个Django地狱,并测试一些请求,看看你的路由器被给予什么数据:

  $ ./manage.py shell 
[...]
>>>来自my_project.my_app.models import User
>>> User.objects.get(pk = 1234)
db_for_read(< class'my_project.my_app.models.User'>,{})
<用户:用户对象>
>>> user = User.objects.create(name =Arthur,title =King)
db_for_write(< class'my_project.my_app.models.User'>,{})
> >> user.name =Kong
>>> user.save()
db_for_write(< class'my_project.my_app.models.User'>,{'instance':
< User:User object>})
> >>

如您所见,提示当没有可用实例(内存))时,始终为空。因此,如果您需要查询参数(例如对象的id),以确定要查询哪个数据库,则无法使用路由器。如果Django在提示 dict中提供查询或查询对象可能是可能的。



所以回答你的问题,我会说现在你必须创建一个自定义管理器,如亚伦·梅里亚姆所建议的。但是,仅仅覆盖创建方法是不够的,因为您还需要能够在适当的数据库中获取对象。这样的东西可能会工作(未测试):

  class CustomManager(models.Manager)
def self.find_database_alias (self,pk):
return#...实现从pk

def self.new_object_database_alias(self)中确定碎片的逻辑:
return#...一个新对象的数据库别名

def get(self,* args,** kargs):
pk = kargs.get(pk)
如果pk为None:
raise异常(分片表:您必须提供主键)
db_alias = self.find_database_alias(pk)
qs = self.get_query_set()。using(db_alias)
return qs.get(* args,** kargs)

def create(self,* args,** kwargs):
db_alias = self.new_object_database_alias()
qs = super(CustomManager,self).using(db_alias)
return qs.create(* args,** kwargs)

class ModelA(models.Model):
objects = CustomManager()

干杯


I have successfully got my application running over several databases using the routing scheme based on models. I.e. model A lives on DB A and model B lives on DB B. I now need to shard my data. I am looking at the docs and having trouble working out how to do it as the same model needs to exist on multiple database servers. I want to have a flag to say DB for NEW members is now database X and that members X-Y live on database N etc.

How do I do that? Is it using **hints, this seems inadequately documented to me.

解决方案

The hints parameter is designed to help your database router decide where it should read or write its data. It may evolve with future versions of python, but for now there's just one kind of hint that may be given by the Django framework, and that's the instance it's working on.

I wrote this very simple database router to see what Django does:

# routers.py
import logging
logger = logging.getLogger("my_project")

class DebugRouter(object):
    """A debugging router"""

    def db_for_read(self, model, **hints):
        logger.debug("db_for_read %s" % repr((model, hints)))
        return None

    def db_for_write(self, model, **hints):
        logger.debug("db_for_write %s" % repr((model, hints)))
        return None

    def allow_relation(self, obj1, obj2, **hints):
        logger.debug("allow_relation %s" % repr((obj1, obj2, hints)))
        return None

    def allow_syncdb(self, db, model):
        logger.debug("allow_syncdb %s" % repr((db, model)))
        return None

You declare this in settings.py:

DATABASE_ROUTERS = ["my_project.routers.DebugRouter"]

Make sure logging is properly configured to output debug output (for example to stderr):

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'handlers': {
        [...some other handlers...] 
        'stderr': {
            'level': 'DEBUG',
            'class': 'logging.StreamHandler'
        }
    },
    'loggers': {
        [...some other loggers...]
        'my_project': {
            'handlers': ['stderr'],
            'level': 'DEBUG',
            'propagate': True,
        },
    }
}

Then you can open a Django shell and test a few requests to see what data your router is being given:

$ ./manage.py shell
[...]
>>> from my_project.my_app.models import User
>>> User.objects.get(pk = 1234)
db_for_read (<class 'my_project.my_app.models.User'>, {})
<User: User object>
>>> user = User.objects.create(name = "Arthur", title = "King")
db_for_write (<class 'my_project.my_app.models.User'>, {})
>>> user.name = "Kong"
>>> user.save()
db_for_write (<class 'my_project.my_app.models.User'>, {'instance':
              <User: User object>})
>>>

As you can see, the hints is always empty when no instance is available (in memory) yet. So you cannot use routers if you need query parameters (the object's id for example) in order to determine which database to query. It might be possible in the future if Django provides the query or queryset objects in the hints dict.

So to answer your question, I would say that for now you must create a custom Manager, as suggested by Aaron Merriam. But overriding just the create method is not enough, since you also need to be able to fetch an object in the appropriate database. Something like this might work (not tested yet):

class CustomManager(models.Manager)
    def self.find_database_alias(self, pk):
        return #... implement the logic to determine the shard from the pk

    def self.new_object_database_alias(self):
        return #... database alias for a new object

    def get(self, *args, **kargs):
        pk = kargs.get("pk")
        if pk is None:
            raise Exception("Sharded table: you must provide the primary key")
        db_alias = self.find_database_alias(pk)
        qs = self.get_query_set().using(db_alias)
        return qs.get(*args, **kargs)

    def create(self, *args, **kwargs):
        db_alias = self.new_object_database_alias()
        qs = super(CustomManager, self).using(db_alias)
        return qs.create(*args, **kwargs)

class ModelA(models.Model):
    objects = CustomManager()

Cheers

这篇关于Django数据分片的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆