根据 URL 自动确定网站页面的自然语言 [英] Automatically determine the natural language of a website page given its URL

查看：22 发布时间：2021/12/15 15:28:16 python url web nlp

本文介绍了根据 URL 自动确定网站页面的自然语言的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找一种方法，可以根据网址自动确定网站页面使用的自然语言.

I'm looking for a way to automatically determine the natural language used by a website page, given its URL.

在 Python 中，函数类似于:

In Python, a function like:

def LanguageUsed (url):
    #stuff

返回语言说明符(例如，'en' 代表英语，'jp' 代表日语，等等...)

Which returns a language specifier (e.g. 'en' for English, 'jp' for Japanese, etc...)

结果摘要:我有一个合理的解决方案，使用代码从 PyPi for oice.langdet.它在区分英语和非英语方面做得很好，这是我目前所需要的.请注意，您必须使用 Python urllib 获取 html.另外，oice.langdet 是 GPL 许可证.

Summary of Results: I have a reasonable solution working in Python using code from the PyPi for oice.langdet. It does a decent job in discriminating English vs. Non-English, which is all I require at the moment. Note that you have to fetch the html using Python urllib. Also, oice.langdet is GPL license.

如其他人建议的那样在 Python 中使用 Trigrams 的更通用的解决方案，请参阅此 ActiveState 的 Python Cookbook Recipe.

For a more general solution using Trigrams in Python as others have suggested, see this Python Cookbook Recipe from ActiveState.

Google 自然语言检测 API 运行良好(如果不是我见过的最好的).然而，它是 Javascript 和他们的 TOS 禁止自动化使用.

The Google Natural Language Detection API works very well (if not the best I've seen). However, it is Javascript and their TOS forbids automating its use.

根据 URL 自动确定网站页面的自然语言 [英] Automatically determine the natural language of a website page given its URL

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

根据 URL 自动确定网站页面的自然语言 [英] Automatically determine the natural language of a website page given its URL

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭