图书翻译数据格式 [英] Book translation data format

查看：146 发布时间：2020/5/18 0:50:48 vim nlp translation file-format

本文介绍了图书翻译数据格式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在考虑将一本书从英语翻译成我的母语.我可以很好地翻译，我对vim作为文本编辑器感到满意.我的问题是我想以某种方式保留语义，即我翻译的哪些部分与原始语言相对应.

I'm thinking of translating a book from English to my native language. I can translate just fine, and I'm happy with vim as a text editor. My problem is that I'd like to somehow preserve the semantics, i.e. which parts of my translation correspond to the original.

我基本上可以创建一种简单的基于XML的标记语言，看起来像

I could basically create a simple XML-based markup language, that'd look something like

<book>
  <chapter>
    <paragraph>
      <sentence>
        <original>This is an example sentence.</original>
        <translation lang="fi">Tämä on esimerkkilause.</translation>
      </sentence>
    </paragraph>
  </chapter>
</book>

现在，这可能会有好处，但是我认为编辑不会很有趣.

Now, that would probably have its benefits but I don't think editing that would be very fun.

我想到的另一种可能性是将原始文件和译文保留在单独的文件中.如果我在每个翻译块之后添加换行符并保持行号一致，那么编辑将很容易，而且我将能够以编程方式匹配原始文本和翻译.

Another possibility that I can think of would be to keep the original and translation in separate files. If I add a newline after each translation chunk and keep line numbering consistent, editing would be easy and I'd be able to programmatically match the original and translation.

original.txt:
  This is an example sentence.
  In this format editing is easy.

translation-fi.txt:
  Tämä on esimerkkilause.
  Tässä muodossa muokkaaminen on helppoa.

但是，这似乎不是很可靠.这很容易搞砸.可能有人有更好的主意.因此，问题是:

However, this doesn't seem very robust. It would be easy to mess up. Probably someone has better ideas. Thus the question:

使用文本编辑器进行图书翻译的最佳数据格式是什么?

添加了标签vim，因为我更喜欢使用vim并相信某些vim专家可能有想法.

added tag vim, since I'd prefer to do this with vim and believe that some vim guru might have ideas.

对此开始了赏金.我目前倾向于我描述的第二个想法，但我希望得到一些既易于编辑(也很容易实现)但更可靠的东西.

started a bounty on this. I'm currently leaning to the second idea I describe, but I hope to get something about as easy to edit (and quite easy to implement) but more robust.

推荐答案

一个想法:如果将每个可翻译的块(一个或多个句子)放在自己的行中，vim的选项scrollbind，cursorbind和一个简单的垂直拆分将帮助您保持块同步".看起来很像vimdiff的默认设置.这样，文件应该具有相同的行数，甚至不需要切换窗口！

One thought: if you keep each translatable chunk (one or more sentences) in its own line, vim's option scrollbind, cursorbind and a simple vertical split would help you keeping the chunks "synchronized". It looks very much like to what vimdiff does by default. The files should then have the same amount of lines and you don't even need to switch windows!

但是，这并不是十分完美，因为换行会导致混乱.如果您的翻译内容比原始文字多了两到三条虚拟线，则视觉相关性就会消失，因为这些线不再一对一.我找不到解决该问题的解决方案或脚本.

But, this isn't quite perfect because wrapped lines tend to mess up a little bit. If your translation wraps over two or three more virtual lines than the original text, the visual correlation fades as the lines aren't one-on-one anymore. I couldn't find a solution or a script for fixing that behavior.

我建议的其他建议是将翻译内容与原文进行隔行扫描.这接近了Benoit建议的diff方法.在将原件拆分成块(每行一个块)之后，我会在每行前添加一个>>或类似名称.一个块的翻译将以o开始.该文件将如下所示:

Other suggestion I would propose is to interlace the translation into the original. This approaches the diff method of Benoit's suggestion. After the original is split up into chunks (one chunk per line), I would prepend a >> or similar on every line. A translation of one chunk would begin by o. The file would look like this:

  >> This is an example sentence.
  Tämä on esimerkkilause.
  >> In this format editing is easy.
  Tässä muodossa muokkaaminen on helppoa.

我将通过执行:match Comment /^>>.*$/或类似方法来增强可读性，无论您的配色方案看起来如何.可能有必要编写一个:syn区域以禁用对原始文本的拼写检查.最后，作为一个细节，我将<C-j>绑定为2j，将<C-k>绑定至2k，以便在重要的部分之间轻松跳转.

And I would enhance the readability by doing a :match Comment /^>>.*$/ or similar, whatever looks nice with your colorscheme. Probably it would be worthwhile to write a :syn region that disables spell checking for the original text. Finally, as a detail, I'd bind <C-j> to do 2j and <C-k> to 2k to allow easy jumping between the parts that matter.

后一种方法的优点还包括:如果您愿意的话，可以将内容包装在80列中:)编写<C-j/k>在翻译之间进行跳转仍然是微不足道的.

Pros for this latter approach also include that you could wrap things in 80 columns if you feel like I do :) It would still be trivial to write <C-j/k> to jump between translations.

缺点:由于缓冲区补全功能现在会同时完成原始单词和翻译单词，因此会受到影响.希望英语单词不会经常出现在翻译中！ :)但这是鲁棒的.完成后，简单的grep会剥离原始文本.

Cons: buffer-completion suffers as now it completes both original and translated words. English words don't hopefully occur in the translations that often! :) But this is as robust as it gets. A simple grep will peel the original text off after you are done.

这篇关于图书翻译数据格式的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

图书翻译数据格式 [英] Book translation data format

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

图书翻译数据格式 [英] Book translation data format

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭