刮Google翻译 [英] Scraping Google Translate

查看:71
本文介绍了刮Google翻译的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用NodeJS和cheerio库抓取Google Translate:

I would like to scraping Google Translate with NodeJS and cheerio library:

request("http://translate.google.de/#de/en/hallo%20welt", function(err, resp, body) {
    if(err) throw err;

    $ = cheerio.load(body);
    console.log($('#result_box').find('span').length);    
}

但是他无法从翻译框(result_box)中找到必要的跨度元素。在网站的源代码中,它看起来像这样:

But he can't find the necessary span-elements from translation box (result_box). In source code of the website it looks like this:

<span id="result_box">
    <span class="hps">hello</span>
    <span class="hps">world</span>
</span>

我想我可以等5到10秒钟,直到Google创建了所有span元素,但不..似乎不是。.

So I think I could wait 5-10 seconds til Google has created all span-elements, but no.. seems to be that isn't..

setTimeout(function() {
        $ = cheerio.load(body);
        console.log($('#result_box').find('span').length);    
    }, 15000);

能帮我吗? :)

解决方案:

我使用的不是cheerio http.get:

Instead of cheerio I use http.get:

http.get(
  this.prepareURL("http://translate.google.de/translate_a/t?client=t&sl=de&tl=en&hl=de&ie=UTF-8&oe=UTF-8&oc=2&otf=1&ssel=5&tsel=5&pc=1&q=Hallo", 
  function(result) {
    result.setEncoding('utf8');
    result.on("data", function(chunk) {
        console.log(chunk); 
    });
}));

所以我得到了带有翻译的结果字符串。

So I get a result string with translation. The used url is the request to server.

推荐答案

无法在节点中使用cheerio来取消google翻译的原因谷歌没有在谷歌方面渲染翻译页面!
他们用脚本答复您的请求,然后脚本发出一个包含您的字符串的api请求。然后,用户端的脚本再次运行并构建您看到的内容,而这在cheerio中是不会发生的!

The reason that you can't use cheerio in node to scrap google translation that google is not rendering the translation page at google side! They reply with a script to your request then the script make an api request that includes your string. Then the script at the user side run again and build the content you see and that's what not happen in cheerio!

所以您需要向api请求,但这是google,他们可以检测到抓取,因此经过几次尝试便会阻止您!

So you need to do a request to the api but it's google and they can detect scrapping so they will block you after a few attempts!

您仍然可以伪造用户的行为,但这会花费很长时间,并且可能会阻止您随时!

You still can fake a user behavior but it'll take long time and they may block you at any time!

这篇关于刮Google翻译的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆