验证维基媒体语言的语言代码 [英] Validate language code for Wikimedia languages

查看:44
本文介绍了验证维基媒体语言的语言代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 shell 脚本,它使用维基数据查询服务 (WDQS) 来获取所需的数据.运行 WDQS 的 SPARQL 查询采用输入参数语言代码.

I have a shell script that uses Wikidata Query Service (WDQS) to get required data. The SPARQL query that run WDQS takes input parameter language code.

如果输入语言代码是有效的维基媒体语言代码作为下面链接中的第一列数据,我是否可以检查 shell 脚本https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/全部

Is there a way that I can check in shell script if the input language code is a valid Wikimedia language code as the first column data in below link https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all

推荐答案

这些代码是 wdt:P424.来自财产提案:

——与 ISO 639-1 有很大不同吗?
— 其中许多与 ISO 相同,但并没有以一致的方式完成.一些语言代码有两个字母,有的三个,甚至更多.还有一些情况完全不同(别名:ISO:tosk Albanian,维基媒体:Alemannic).

— Is there a big difference to ISO 639-1?
— Many of them are the same as ISO, but it is not done in a consistent way. Some language codes have two letters, some three, and a few even more. And there are also a few cases where it is completely different (als: ISO: tosk Albanian, Wikimedia: Alemannic).

您可以使用以下简单的 SPARQL 查询检索所有这些代码:

You could retrieve all these codes using the following simple SPARQL query:

SELECT DISTINCT ?code { [] wdt:P424 ?code } ORDER BY ?code

试试吧!

实际上,您链接的列表是由机器人定期生成的.完整查询:

In fact, the list you have linked to is periodically generated by a bot. The full query is:

SELECT ?item ?c
(CONCAT("{","{#language:",?c,"}","}") as ?display)
(CONCAT("{","{#language:",?c,"|","en}","}") as ?displayEN)
(CONCAT("{","{#language:",?c,"|","fr}","}") as ?displayFR)
{
  ?item wdt:P424 ?c .
  MINUS{?item wdt:P31/wdt:P279* wd:Q14827288} #--exclude Wikimedia projects
  MINUS{?item wdt:P31/wdt:P279* wd:Q17442446} #--exclude Wikimedia internal stuff
}

你可以:

  • 将有效代码列表粘贴到您的脚本中,
  • 在脚本启动时预加载列表,
  • 在每次用户输入时执行 ASK SPARQL 查询.
  • paste the list of valid codes into your script, or
  • preload the list at your script startup, or
  • execute an ASK SPARQL query at every user input.

我更喜欢第三个选项:

#!/bin/sh
echo "Enter language code:"
read code
request="curl -g -s https://query.wikidata.org/sparql?query=ASK{?lang%20wdt:P424%20\"$code\"}"

if $request | grep -q "true"; then
    echo "Valid code";
else 
    echo "Invalid code";
fi

这篇关于验证维基媒体语言的语言代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆