使用Google Translate API翻译PDF文件 [英] Translate PDF file using Google Translate API

查看:634
本文介绍了使用Google Translate API翻译PDF文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在我的项目中使用Google翻译。我完成了与Google的所有手续。我也有我的API密钥。有了这个键,我可以很容易地用JavaScript来翻译任何单词。但是,如何翻译PDF文件,就像我们可以在Google翻译网站上做的那样?我发现了这样的一件事:



I want to use Google Translate in my project. I completed all the formalities with Google. I have the API key also with me. With this key I can easily translate any word with JavaScript. But how to translate the PDF file as we can do in Google Translate site? I found one thing like this:

http://translate.google.com/translate?hl=fr&sl=auto&tl=en&u=http://www.example.com/PDF.pdf

But here I cannot use my key, as a result it takes so much time to translate. So I want to use my Key and translate a PDF file. Please help me out. My approach is like this:

1. One html page I have.
2. One browse button for pdf
3. Upload the file
4. Transalte the pdf with Google API and show in the html page.

I searched it for this pdf translate with but did not find anything. Please help me out.

解决方案

TL:DR: Use headless browser to render a PDF from the Google's PDF translation service.

PDF is a complex format and can include many components that are text. To translate it I will describe solution from easy one to more advanced.

Translate raw text

If you only need the translation without the visual output, you can extract the text and give it to Google Translate.

Since you did not provide information on your project (language, environment, ...) I will redirect you to this thread on how to extract text

Translate all text

If you need to get text from everything in your PDF, well that's pretty hard. To avoid headache (partially) you can convert the PDF to an image (using imagemagick tools or similar) and then you have three options:

  • OCR the text from the image, then give it to google, again you are loosing the original form.
  • OCR the text, but saving the position (some libraries can do that, again since you did not specify your project information, see theses links: #1, #2, #3, #4).

    Then translate it with google api, and write the result to the image. For great results you need to take account of text font, color and background color. Pretty difficult, but feasible.

  • Translate the image using google translate image service. Unfortunately this feature is not available in the public API, so unless doing some reverse engineering, this is not possible.

Translate using Google's PDF translation service

The solution you provide by using the translate site can be automated quite easily. The reason it's long is because it is an heavy process and you probably won't beat Google.

Using an headless browser, you can get the translation page with your pdf, then observe that the translated content is sitting in an iframe, get that iframe and finally print to PDF.

Here is a short example using SlimerJS (should be compatible for Phantomjs)

var page = require("webpage").create();

// here you may want to setup page size and options    

// get the page
page.open('https://translate.google.fr/translate?hl=fr&sl=en&u=http://example.com/pdf-sample.pdf', function(status) {
    if (status !== 'success') {
        console.log('Unable to access network');
    } else {
        // find the iframe with querySelector
        var iframe_src = page.evaluate(function() {
            return document.querySelector('#contentframe').querySelector('iframe').src;
        });

        console.log('Found iframe: ' + iframe_src);

        // render the iframe
        page.open(iframe_src, function(status) {
            // wait a bit for javascript to translate
            // this can be optimized to be triggered in javascript when translation is done
            setTimeout(function() {
                // print the page into PDF
                page.render('/tmp/test.pdf', { format: 'pdf' });

                phantom.exit(0);
            }, 2000);

        });
    }
});

Giving this file: http://www.cbu.edu.zm/downloads/pdf-sample.pdf
It produce this result (translated in French): (I posted a screenshot since I cannot embed PDF ;) )

这篇关于使用Google Translate API翻译PDF文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆