Google News XML API:使用国家/地区/语言参数 [英] Google News XML API: use country/language parameters

查看:62
本文介绍了Google News XML API:使用国家/地区/语言参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从 Google 新闻订阅捕获以下查询的 RSS/XML 供稿:

I would like to subscribe to an RSS/XML feed from Google News that captures the following query:

提及studie"(德语为学习")的文章,用德语撰写,来自任何国家/地区.

Articles mentioning "studie" (German for "study"), written in German, emanating from any country.

我正在使用 https://news.google.com/rss/search,但对于此示例,更容易在 https://news.google.com/search<查看 UI 输出/a>,所以我将在本例中使用后一个 URL 库.

I'm using https://news.google.com/rss/search, but for this example, it's easier to see the UI output at https://news.google.com/search, so I'll use the latter URL base in this example.

现在,在 XML API 参考中,Google 提到了四个影响语言或国家/地区的不同参数:

Now, in the XML API reference, Google mentions four different parameters that influence either language or country:

  • hl(宿主语言):假设最终用户输入的语言.即,说英语的人键入study",Google 假设该术语是英文,然后机器将结果翻译回英文.对我来说,导航到将重定向带有 hl=en-US 的 URL(完整 URL 是 https://news.google.com/?hl=en-US&gl=US&ceid=US:en).

  • hl (host language): the language that the end user is assumed to be typing in. I.e., an English-language speaker types "study," and Google assumes that term is in English and then machine-translates the results back to English. For me, navigating to will redirect a URL with hl=en-US (full URL is https://news.google.com/?hl=en-US&gl=US&ceid=US:en).

gl:提升原产国与参数值匹配的搜索结果.我的网络浏览器中的默认值是 gl=US.

gl: boosts search results whose country of origin matches the parameter value. The default in my web browser is gl=US.

lr(语言限制):将搜索结果限制为以特定语言编写的文档

lr (language restrict): restricts search results to documents written in a particular language

cr(国家/地区限制):将搜索结果限制为来自特定国家/地区的文档

cr (country restrict): restricts search results to documents originating in a particular country

基于以上所有内容,这将意味着 * 的 URL:

Based on all of the above, that would imply a URL of*:

https://news.google.com/search?q=study&hl=en-US&lr=lang_de

然而,那次尝试惨遭失败;它显示来自美国的英语结果,它 302 重定向到:

That attempt, however, fails miserably; it shows English-language results from the U.S., and it 302 redirects to:

https://news.google.com/search?q=study&lr=lang_de&hl=en-US&gl=US&ceid=US:en

所以,为此:

  • 如何正确构建 URL 参数以捕获来自任何国家/地区用德语撰写的提及studie"(德语中的study")的文章.
  • ceid 到底是什么?为什么 Google 完全没有记录它?
  • How can I properly structure URL parameters to capture 'Articles mentioning "studie" (German for "study"), written in German, from any country.'?
  • What the heck is ceid and why is it documented absolutely nowhere by Google?

* 即:

>>> import urllib.parse
>>> urllib.parse.parse_qs('q=study&hl=en-US&lr=lang_de')                                                                                                     
{'q': ['study'], 'hl': ['en-US'], 'lr': ['lang_de']}

相关但未解决任何问题:

Related but not resolving any of this:

推荐答案

我对 RSS 界面一无所知,但至于标准新闻 UI,也许这可以用:

I know nothing about the RSS interface but as for the standard news UI maybe this can be of use:

ceid (country:language) 是 Google 的新闻过滤器,因此 lr(Google 新闻似乎忽略了它)和 cr 通过仅筛选由新闻过滤器定义的新闻来进一步限制.对于美国英语新闻,它是 ceid=US:en,对于英国新闻,它是 ceid=GB:en.来源:https://rapidapi.com/apigeek/api/google-search3/details

ceid (country:language) is Google's news filter, so lr (which Google news seem to ignore) and cr are restricted even further by only sifting through the news defined by the news filter. For US news in English it's ceid=US:en and for news in Great Britian it's ceid=GB:en. Source: https://rapidapi.com/apigeek/api/google-search3/details

注意:如果您未指定 ceid,将根据您当前的职位申请一个.此外,Google 新闻似乎根本不关心 lr 参数:它坚持使用 ceid 的语言,仅此而已.根据您的查询:提及研究"的文章(德语为学习"),用德语写的,来自任何国家,我建议使用 DE:de 的值,但是您可能会发现 ceid 参数对于来自任何国家/地区"有些限制,但您对此无能为力.Google 新闻基于这样一个概念:每个地方都有自己的新闻提要,并且来自任何国家/地区".听起来很像来自地球上所有地方的所有新闻",而谷歌没有这样的新闻.《世界》新闻和你所知的不太一样.如果您不需要对生产/出版国家/地区进行任何限制,那么最好寻找其他出口.在 Google 世界中,当文档发布时为了新鲜而适当地应用限制的高级 Google 搜索可能是不可能被击败的.

NOTE: If you don't specify a ceid, one will be applied based on your current position. Also, Google news doesn't seem to care at all about the lr parameter: it sticks to the language of ceid and that's it. Based on your query: Articles mentioning "studie" (German for "study"), written in German, emanating from any country, I would suggest a value of DE:de, however you may find the ceid parameter somewhat constricting regarding "emanating from any country", but there's nothing you can do about that. Google news is based on the concept that every place has their own news feed, and "emanating from any country" sounds an awful lot like "all the news from all places on Earth", and there's no such Google news. "World" news is as you know not quite the same thing. If you need to have no restrictions at all regarding country of production/publication, you'll be better off looking for another outlet. In the Google universe, an advanced Google search proper applying a restriction when the document was published for freshness is probably impossible to beat.

搜索中涉及的其他四个参数是:

The four other parameters involved in your search are:

hl, host(interface) language: hl=de
gl, boost country of origin: gl=DE
lr, restrict results to language: lr=de
cr, restrict results to country: none

建议的搜索字符串中有两个错误:

There's two mistakes in the suggested search string:

https://news.google.com/search?q=study&hl=en-US&lr=lang_de

q=studie, not study, and
lr=de, not lang_de.

然而,谷歌新闻并不关心 lr 参数:它坚持使用 ceid 的语言.此外,hl 始终设置为 ceid 的语言,gl 设置为国家/地区部分,我建议您使用 DE:de 的 ceid 进行查询.

However, Google news doesn't care about the lr parameter: it sticks to the language of ceid. Also, hl is always set to the language of ceid and gl is set to the country part, and I recommend a ceid of DE:de for your query.

因此 DE:de 的搜索字符串变为:

So the search string for DE:de becomes:

https://news.google.com/search?q=studie&hl=de&gl=DE&ceid=DE:de

还要添加到 Sreeram Nair 提供的国会图书馆链接,那里没有提供国家/地区代码.您可以在此处找到国家/地区代码:

Also to add to the Library of Congress link given by Sreeram Nair, there's no country codes given there. You can find country codes here:

• ISO 3166-1 alpha-2(2 个字母的国家/地区)标准,https://en.m.wikipedia.org/wiki/ISO_3166-1_alpha-2

• the ISO 3166-1 alpha-2 (2-letter country) standard, https://en.m.wikipedia.org/wiki/ISO_3166-1_alpha-2

您可能还会发现此文档带有更易于在移动设备上阅读的语言代码:

You may also find this document with language codes easier to read on a mobile:

• ISO 639-1(语言)代码列表https://en.m.wikipedia.org/wiki/List_of_ISO_639-1_codes

• List of ISO 639-1 (language) codes https://en.m.wikipedia.org/wiki/List_of_ISO_639-1_codes

来源:维基百科文章

• 软件术语 Locale,https://en.m.wikipedia.org/wiki/Locale_(computer_software)

• the software term Locale, https://en.m.wikipedia.org/wiki/Locale_(computer_software)

• ISO 639(语言)标准,https://en.m.wikipedia.org/wiki/ISO_639

• the ISO 639 (language) standard, https://en.m.wikipedia.org/wiki/ISO_639

这篇关于Google News XML API:使用国家/地区/语言参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆