开源机器翻译引擎? [英] Open Source Machine Translation Engines?

查看:826
本文介绍了开源机器翻译引擎?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在寻找一个开放源代码的机器翻译引擎,该引擎可以并入我们的本地化工作流程中.我们正在查看以下选项:

We're looking for an open source Machine Translation Engine that could be incorporated into our localization workflow. We're looking at the options below:

  1. 摩西(C ++)
  2. Joshua (Java)
  3. 短语(Java)
  1. Moses (C++)
  2. Joshua (Java)
  3. Phrasal (Java)

其中,Moses得到了社区的最广泛支持,并已被许多本地化公司和研究人员试用.实际上,我们的应用程序全部使用Java,因此我们倾向于使用基于Java的引擎.让您中的任何人都将Joshua或Phrasal用作工作流程的一部分.您能否与他们分享您的经验?或者,就其提供的功能和易于集成而言,Moses是否在这些方面遥遥领先?

Among these, Moses has the widest community support and has been tried out by many localization companies and researchers. We are actually leaning towards a Java-based engine since our applications are all in Java. Have any of you used either Joshua or Phrasal as part of your workflow. Could you please share your experiences with them? Or, is Moses way too far ahead of these in terms of the features it provides and ease of integration.

而且,我们要求引擎支持:

And, we require that the engine supports:

  1. 特定于域的训练(即,它应为输入数据所属的每个域维护单独的短语表).
  2. 增量训练(即避免每次我们希望使用一些新的训练数据时都必须从头开始重新训练模型).
  3. 平行翻译过程.

推荐答案

很多方面都在向前发展,所以我想对该主题进行更新,并保留先前的答案以记录进度.

A lot has been moving forward, so I thought to give an update on this topic, and leave the previous answer there to document the progress.

特定于域的培训:如果您的数据来自各种来源,并且您需要针对子域进行优化,则域适应技术会很有用.根据我们的经验,没有一个解决方案能够始终如一地发挥最佳性能,因此您需要尝试尽可能多的方法并比较结果. Moses邮件列表上有一封邮件,列出了可能的方法: http://thread.gmane.org/gmane.comp.nlp.moses.user/9742/focus=9799various .以下页面还概述了当前的研究: http://www.statmt.org/survey /Topic/DomainAdaptation

Domain-specific training: domain adaptation techniques can be useful if your data is taken from various sources and you need to optimise towards a sub-domain. From our experience, there is no single solution that consistently performs best, so you need to try out as many as possible approaches and compare results. There is a mail on the Moses mailing list that lists possible methods: http://thread.gmane.org/gmane.comp.nlp.moses.user/9742/focus=9799various. The following page also gives an overview of the current research: http://www.statmt.org/survey/Topic/DomainAdaptation

增量培训:有关IWSLT 2013的有趣演讲: http://www.iwslt2013. org/downloads/Assessing_Quick_Update_Methods_of_Statistical_Translation_Models.pdf 证明了当前的增量方法(1)使您的系统脱机,因此您的模型没有真正的实时更新"(2)被全面的再培训所超越.看来问题尚未解决.

Incremental training: there was an interesting talk on IWSLT 2013: http://www.iwslt2013.org/downloads/Assessing_Quick_Update_Methods_of_Statistical_Translation_Models.pdf it demonstrated that current incremental methods (1) take your system offline, so you have no real "live-update" of your models (2) are outperformed by full re-trainings. It seems that the problem has not been solved yet.

并行化翻译过程:moses服务器在moses-cmd二进制文件上落后.因此,如果要使用最新功能,最好从moses-cmd开始.同样,社区没有兑现从未发布1.0版本的承诺:-).实际上,您可以在此处找到最新版本(2.1): http://www. statmt.org/moses/?n=Moses.Releases

Parallelizing the translation process: the moses server lags behind on the moses-cmd binary. So if you want to use the latest features, it is better to start from moses-cmd. Also, the community has not kept its promise of never releasing a 1.0 version :-). In fact, you can find the latest release (2.1) here: http://www.statmt.org/moses/?n=Moses.Releases

这篇关于开源机器翻译引擎?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆