是否有半自动方式为i18n执行字符串提取? [英] Is there a semi-automated way to perform string extraction for i18n?

查看:112
本文介绍了是否有半自动方式为i18n执行字符串提取?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个Java项目,其中包含大量用于用户提示,错误消息等的英语字符串。我们想要将所有可翻译的字符串提取到属性文件中,以便以后翻译它们。

We have a Java project which contains a large number of English-language strings for user prompts, error messages and so forth. We want to extract all the translatable strings into a properties file so that they can be translated later.

例如,我们想要替换:

Foo.java

String msg = "Hello, " + name + "! Today is " + dayOfWeek;

with:

Foo .java

String msg = Language.getString("foo.hello", name, dayOfWeek);

language.properties

foo.hello = Hello, {0}! Today is {1}

我知道以完全自动化的方式完成此操作几乎是不可能的,因为不是每个字符串都应该翻译。但是,我们想知道是否有一种半自动的方式可以消除一些费力。

I understand that doing in this in a completely automated way is pretty much impossible, as not every string should be translated. However, we were wondering if there was a semi-automated way which removes some of the laboriousness.

推荐答案

你想要的是一个一个工具,它将涉及字符串连接的每个表达式替换为一个库调用,明显的特殊情况表达式只涉及一个文字字符串。

What you want is a tool that replaces every expression involving string concatenations with a library call, with the obvious special case of expressions involving just a single literal string.

一个程序转换系统,你可以在其中表达你想要的模式可以做到这一点。
这样的系统接受以下形式的规则:

A program transformation system in which you can express your desired patterns can do this. Such a system accepts rules in the form of:

         lhs_pattern -> rhs_pattern  if condition ;

其中模式是对模式变量具有语法类别约束的代码片段。这会导致工具查找与lhs_pattern匹配的语法,如果找到,则替换为rhs_pattern,其中模式匹配超过langauge结构而不是文本。因此无论代码格式,缩进,注释等如何都可以。

where patterns are code fragments with syntax-category constraints on the pattern variables. This causes the tool to look for syntax matching the lhs_pattern, and if found, replace by the rhs_pattern, where the pattern matching is over langauge structures rather than text. So it works regardless of code formatting, indentation, comments, etc.

草拟一些规则(并过度简化以保持此简短)
遵循您的风格示例:

Sketching a few rules (and oversimplifying to keep this short) following the style of your example:

  domain Java;

  nationalize_literal(s1:literal_string):
    " \s1 " -> "Language.getString1(\s1 )";

  nationalize_single_concatenation(s1:literal_string,s2:term):
    " \s1 + \s2 " -> "Language.getString1(\s1) + \s2"; 

  nationalize_double_concatenation(s1:literal_string,s2:term,s3:literal_string): 
      " \s1 + \s2 + \s3 " -> 
      "Language.getString3(\generate_template1\(\s1 + "{1}" +\s3\, s2);"
   if IsNotLiteral(s2);

模式本身用...括起来;这些不是Java字符串文字,而是一种说法对于多计算机 - 语言模式匹配引擎
,...中的suff是(域)Java代码。元素用\标记,
例如,metavariables \s1 ,\s2,\s3和嵌入式模式调用\generate with(和)表示其元参数列表: - }

The patterns are themselves enclosed in "..."; these aren't Java string literals, but rather a way of saying to the multi-computer-lingual pattern matching engine that the suff inside the "..." is (domain) Java code. Meta-stuff are marked with \, e.g., metavariables \s1, \s2, \s3 and the embedded pattern call \generate with ( and ) to denote its meta-parameter list :-}

注意使用metavariables s1和s3上的语法类别约束,以确保只匹配字符串文字。左侧模式中元变量匹配的内容在右侧替换。

Note the use of the syntax category constraints on the metavariables s1 and s3 to ensure matching only of string literals. What the meta variables match on the left hand side pattern, is substituted on the right hand side.

子模式generate_template是一个过程,它在转换时(例如,当规则触发时)将其已知的常量第一个参数计算为th您建议的e模板字符串并插入到库中,并返回库字符串索引。
注意生成模式的第一个参数是这个例子完全由连接的文字字符串组成。

The sub-pattern generate_template is a procedure that at transformation time (e.g., when the rule fires) evaluates its known-to-be-constant first argument into the template string you suggested and inserts into your library, and returns a library string index. Note that the 1st argument to generate pattern is this example is composed entirely of literal strings concatenated.

显然,有人必须手工处理模板化的字符串最终在库中产生外语等价物。

你可能会因为某些字符串不能放在国有化的字符串库中而过度模板化。如果您可以为这些案例编写程序化检查,则可以将它们作为条件包含在规则中以防止它们被触发。 (通过一点点努力,你可以将未转换的文本放入注释中,以便以后更容易撤消单个转换。)

Obviously, somebody will have to hand-process the templated strings that end up in the library to produce the foreign language equivalents.
You're right in that this may over templatize the code because some strings shouldn't be placed in the nationalized string library. To the extent that you can write programmatic checks for those cases, they can be included as conditions in the rules to prevent them from triggering. (With a little bit of effort, you could place the untransformed text into a comment, making individual transformations easier to undo later).

实际上,我猜你有编码~~ 100这样的规则来涵盖组合学和特殊利益案例。回报是您的代码自动增强。如果操作正确,您可以在代码经过多个版本时反复将此转换应用于您的代码;它会留下以前国有化的表达式,只修改幸福快乐程序员插入的新表达式。

Realistically, I'd guess you have to code ~~100 rules like this to cover the combinatorics and special cases of interests. The payoff is that the your code gets automatically enhanced. If done right, you could apply this transformation to your code repeatedly as your code goes through multiple releases; it would leave previously nationalized expressions alone and just revise the new ones inserted by the happy-go-lucky programmers.

可以做到这一点的系统是 DMS Software Reengineering Toolkit 。 DMS可以解析/模式匹配/转换/漂亮打印许多语言,包括Java和C#。

A system which can do this is the DMS Software Reengineering Toolkit. DMS can parse/pattern match/transform/prettyprint many langauges, include Java and C#.

这篇关于是否有半自动方式为i18n执行字符串提取?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆