MongoDB上的多语言数据建模 [英] Multilingual data modeling on MongoDB

查看:74
本文介绍了MongoDB上的多语言数据建模的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在MonogoDB上为我的对象建模,但不确定如何继续.我正在建立一个产品目录,该目录将是:

  • 产品目录无频繁更改.每周/两周进行一次批量操作.
  • 产品信息有多种语言(英语,西班牙语,法语),可以随时添加新语言.

这是我想要做的事情:我需要对产品目录进行建模以捕获多语言功能.假设我有:

 product : { 
 _id:xxx,
 sku:"23456",
 name:"Name",
 description: "Product details", 
 tags:["x1","x2"]}... 
}
 

当然,名称,描述,标签和可能的图像会根据语言而变化.那么,我该如何建模呢?

  1. 我可以为每种语言提供单独的集合,例如:enProducts,esProducts等
  2. 在产品本身中具有JSON表示形式,并带有以下各种语言:

     product :{
       id: xxx,
       en: {
             name: "Name",
             description: "product details.."
           },
       es: {
             name: "Name",
             description: "product details.."
           },
       ...   
    }
     


还是有其他解决方案?在这里需要MongoDB建模专家的帮助:)

解决方案

这两种解决方案通常都是标准的,第一种也是RDBMS技术中的标准(或者基于文件的翻译是此处无法实现的另一种方法). /p>

至于哪种方法最好,考虑到您的使用,我倾向于第二种方法.

某些原因是:

  • 所有翻译和产品数据只需一个文档加载,无需加入
  • 对磁盘进行一次连续读取
  • 允许原子更新以及向单个产品添加新语言和更改等

但是会带来一些缺点:

  • 更新可能(可能会)产生碎片,可以通过powerof2sizes在一定程度上(不完全)纠正碎片
  • 您的所有操作现在都将进入硬盘的单个部分,这实际上可能会造成瓶颈,但是,您的情况是您根本不会经常进行更新,因此这不成问题. li>

作为旁注:我认为碎片化对您来说可能不是太大的问题.原因是您只真正批量导入了产品,可能是从CSV导入的,因此,文档的插入量通常不会超过2的幂.因此,这一点可能已过时.

因此,总的来说,如果计划正确,第二个选择是不错的选择,但是,需要考虑一些注意事项:

  • 多个描述/字段可以使文档超过16meg限制吗?
  • 如何手动填充到文档以有效利用空间并防止碎片化?

如果您选择第二种方法,那么这就是您最担心的问题.

考虑到您可以将莎士比亚的所有作品都容纳到4MB中并留有余地,我实际上不确定您是否会达到16MB的限制,如果这样做了,则必须是一些可观的文本,并且可能存储图像以二进制形式放入文档中.

回到第一种选择,您最担心的是某些数据的重复,即价格(法国和西班牙都有欧元),除非您使用两个文档,一个用于存放通用数据,另一个用于翻译(这将实际制作了4个文档,但有两个查询).

考虑到除非批量复制数据,否则该目录将永远不会被更新(但是,在扩展的情况下,我会非常谨慎,以供将来参考,请谨慎行事),

  • 您可以将其翻译成一份文档,而不必担心在所有地区自动更新价格
  • 您读取了一个没有碎片的磁盘
  • 无需手动填充文档

所以这两个选项都可以使用,但是我倾向于第二种情况.

I am trying to model my objects on MonogoDB and not sure how to proceed. I am building a Product catalog that will be:

  • No frequent changes to product catalog. A bulk operation may be done weekly / fortnight.
  • Product information is in multiple languages ( English, Spanish , French ) new language may be added anytime.

Here is what I am trying to do: I need to model my product catalog to capture the multilingual functionality. Assume I have:

product : { 
 _id:xxx,
 sku:"23456",
 name:"Name",
 description: "Product details", 
 tags:["x1","x2"]}... 
}

Surely, name,description, tags and possible images will change according to language. So, how do I model it?

  1. I can have a seperate collection for each language eg: enProducts,esProducts etc
  2. Have JSON representation in the product itself with the individual languages like:

    product :{
       id: xxx,
       en: {
             name: "Name",
             description: "product details.."
           },
       es: {
             name: "Name",
             description: "product details.."
           },
       ...   
    }
    


Or is there any other solution? Need help of MongoDB modeling experts here :)

解决方案

Both solutions are normally standard for this, the first being standard in RDBMS techs as well (or file based translations being another method that is not possible here).

As for which is best right here, I am leaning towards the second considering your use.

Some of the reasons would be:

  • One single document load for all translations and product data, no JOINs
  • Making for a single contiguous read of your disk
  • Allowing for atomic updating and adding of new languages and changes etc to a single product

But creating some downsides:

  • Updating could (probably will) create fragmentation which can be remedied to some extent (not completely) by powerof2sizes
  • All your ops will now go to one single part of your hard disk which may actually create a bottle neck however, your scenario is such that you do not update often if at all so this shouldn't be a problem.

As a side note: I am judging that fragmentation might not bee too much of a problem for you. The reason being is that you only really bulk import products, probably from a CSV as such your documents will not probably grow greater than by the power of 2 from their insertion regularly. As such this point might be obsolete.

So overall, if planned right the second option is a good one however, there are some considerations to take into account:

  • Could the multiple descriptions/fields push the document past the 16meg limit?
  • How to manually pad to the document to efficiently use space and prevent fragmentation?

Those are your biggest concerns if you go with the second option.

Considering that you can fit all of the works of Shakespear into 4MB with room to spare I am actually not sure if you will reach the 16MB limit, if you do it would have to be some considerable text, and maybe storing the images in binary into the document.

Coming back to the first option, your largest concern will be duplication of certain data, i.e. price (France and Spain both have the Euro) unless you use two documents, one to house common data and the other a translation (this will make 4 documents actually but two queries).

Considering that this catalogue will never be updated unless in bulk duplicated data will not matter too much (however, for future reference in the case of expansion I will be cautious) so:

  • You can make it have one document per translation and not worry about updating prices atomically across all regions
  • You have one disk read without the fragmentation
  • No need to manually pad your documents

So both options are readily available but I am leaning towards the second case.

这篇关于MongoDB上的多语言数据建模的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆