如何设计翻译词典的数据库? [英] How to design a database for translation dictionary?

查看:20
本文介绍了如何设计翻译词典的数据库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有数据库,其中包含 exp 中的单词和短语.英语到其他 15 种语言,以及该列表中的每种语言到其他 15 种语言.对于一对,它们现在在一个表中排序,如下所示 (en -> de):

I have database with words and phrases from for exp. English to 15 other languages, and also for every language in that list to other 15. For one pair they are sort for now in one table like this (en -> de):

  • id_pair
  • word_en
  • word_de

为庞大的单词和短语列表创建数据库的最佳方法是什么?我知道我必须将每种主要语言与其他语言区分开来,并且可能是这样想的:

What is the best way to create database for that huge list of words and phrases? I know that I must separate every primary language from others, and was thinking maybe like this:

ENGLISH
ID | WORD
1  | 'dictionary'

GERMAN
ID | WORD
1  | 'lexikon'
2  | 'wörterbuch'

TRANSLATION_EN_DE
ID_EN | ID_DE
1     | 1
1     | 2

这是规范化数据库的最佳方式吗?但是短语是什么,我还需要如果有人输入单词dictionay",这也会返回这本词典很好"和翻译.(我知道这可以在带有 sql 查询的第一个表中找到,这是最好的方法吗?)

Is this the best way to normalize DB? But what is with phrases, I need also if someone enter word "dictionay" that this returns also "This dictionary is good" and translation for that. (I know this can find in first table with sql query, is that best way?)

也一直需要按字母顺序排列,我每天都会有很多新条目,所以我可以在某人寻找翻译的单词/阶段前后打印几个单词.

Also need it alphabetically all time, I will have lot of new entry daily, so I can print couple words before and after the word/phases someone looking for translate.

我被卡住了,无法决定优化它的最佳方法是什么.这些数据库总共有超过 15GB 的基于文本的翻译,每天大约 100k 请求,所以每毫秒都值得.:)任何帮助将不胜感激,谢谢!

I'm stuck and cant decide what is the best way to optimize it. These db have all together more than 15gb just text based translation, and around 100k daily req, so every ms worth. :) Any help will be appreciate, thx!

推荐答案

由于每种语言都有单独的表格,您需要大量的连接表格来涵盖所有可能的翻译组合.最重要的是,添加一种新语言需要添加更多表、重写查询、客户端代码等.

With separate table for each language, you'd need a large number of junction tables to cover all the possible translation combinations. On top of that, adding a new language would require adding more tables, rewriting the queries, client code etc.

最好用更通用的方式来做,类似于这样:

It's better to do it in a more generalized way, similar to this:

关于TRANSLATION表,我建议也创建一个CHECK (WORD_ID1 < WORD_ID2)并创建一个索引{WORD_ID2, WORD_ID1}(与PK相反的方向"),并代表翻译的两个方向只有一行.

Regarding the TRANSLATION table, I propose to also create a CHECK (WORD_ID1 < WORD_ID2) and create an index {WORD_ID2, WORD_ID1} (the opposite "direction" from the PK), and represent the both directions of the translation with only one row.

考虑聚类 TRANSLATION 表,如果您的 DBMS 支持这一点.

Consider clustering the TRANSLATION table if your DBMS supports that.

也一直需要按字母顺序排列

Also need it alphabetically all time

查询...

SELECT * FROM WORD WHERE LANGUAGE_ID = :lid ORDER BY WORD_TEXT

...可以使用 UNIQUE 约束 {LANGUAGE_ID, WORD_TEXT} 下的索引.

...can use the index underneath the UNIQUE constraint {LANGUAGE_ID, WORD_TEXT}.

这篇关于如何设计翻译词典的数据库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆