将许多文本文件批量上传到MediaWiki [英] Mass-upload many text files to MediaWiki

查看:455
本文介绍了将许多文本文件批量上传到MediaWiki的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多文本文件要上传到运行MediaWiki的Wiki. 我什至不知道这是否真的可能,但我想尝试一下.

I have many text files that I want to upload to a wiki running MediaWiki. I don't even know if this is really possible, but I want to give it a shot.

每个文本文件的名称将成为Wiki页面的标题.

Each text file's name will be the title of the wiki page.

一个Wiki页面包含一个文件.

One wiki page for one file.

我想从程序所在的同一文件夹上载所有文本文件.

I want to upload all text files from the same folder as the program is in.

也许要求您编写所有代码的要求太高了,那么您能否告诉我至少我应该寻找哪种语言来尝试一下?

Perhaps asking you to code it all is asking too much, so could you tell me at least which language I should look for to give it a shot?

推荐答案

您可能想要的是 bot 使用 MediaWiki API 为您创建文章.可能最知名的机器人框架是 pywikipedia (适用于Python),但是有

What you probably want is a bot to create the articles for you using the MediaWiki API. Probably the best known bot framework is pywikipedia for Python, but there are API libraries and bot frameworks for many other languages too.

事实上,pywikipedia附带了一个名为 pagefromfile.py 的脚本.所做的事情与您想要的非常接近.默认情况下,它会从一个文件创建多个页面,但是如果您知道一些Python,更改它应该不会太难.

In fact, pywikipedia comes with a script called pagefromfile.py that does something pretty close to what you want. By default, it creates multiple pages from a single file, but if you know some Python, it shouldn't be too hard to change that.

实际上,如果文件与您的Wiki在同一台服务器上运行(或可以将其上传到该服务器上),那么您甚至根本不需要机器人;这里有一个MediaWiki importTextFile.php 即可为您完成.您可以使用简单的shell脚本在给定目录中的所有文件中运行它,例如:

Actually, if the files are on the same server your wiki runs on (or you can upload them there), then you don't even need a bot at all: there's a MediaWiki maintenance script called importTextFile.php that can do it for you. You can run it in for all files in a given directory with a simple shell script, e.g.:

for file in directory/*.txt; do
   php /path/to/your/mediawiki/maintenance/importTextFile.php "$file";
done

(显然,将directory替换为包含文本文件的目录,并将/path/to/your/mediawiki替换为MediaWiki安装的实际路径.)

(Obviously, replace directory with the directory containing the text files and /path/to/your/mediawiki with the actual path of your MediaWiki installation.)

默认情况下,importTextFile.php将以文件名为基础创建页面的名称,并去除所有目录前缀和扩展名.另外,按照标准的MediaWiki页面命名规则,下划线将被空格替换,首字母将大写(除非您已在您的LocalSettings.php中将其关闭);因此,例如,文件directory/foo_bar.txt将作为页面"Foo bar"导入.如果要更好地控制页面命名,importTextFile.php还支持显式的--title参数.或者,您始终可以复制脚本并自行修改以更改页面命名规则.

By default, importTextFile.php will base the name of the created page on the filename, stripping any directory prefixes and extensions. Also, per standard MediaWiki page naming rules, underscores will be replaced by spaces and the first letter will be capitalized (unless you've turned that off in your LocalSettings.php); thus, for example, the file directory/foo_bar.txt would be imported as the page "Foo bar". If you want finer control over the page naming, importTextFile.php also supports an explicit --title parameter. Or you could always copy the script and modify it yourself to change the page naming rules.

Ps.还有另一个名为 edit.php 的MediaWiki维护脚本,它执行的功能几乎相同与importTextFile.php相同,只是它从标准输入中读取页面文本,并且不具有importTextFile.php的便捷默认页面命名规则.不过,使用Unix管道进行自动编辑可能非常方便.

Ps. There's also another MediaWiki maintenance script called edit.php that does pretty much the same thing as importTextFile.php, except that it reads the page text from standard input and doesn't have the convenient default page naming rules of importTextFile.php. It can be quite handy for automated edits using Unix pipelines, though.

附录:importTextFile.php脚本期望文件名和内容采用UTF-8编码.如果您的文件采用其他某种编码,则必须先对其进行修复,或者修改脚本进行转换,例如使用 mb_convert_encoding().

Addendum: The importTextFile.php script expects the file names and contents to be in the UTF-8 encoding. If your files are in some other encoding, you'll have to either fix them first or modify the script to do the conversion, e.g. using mb_convert_encoding().

特别是,应该对脚本进行以下修改:

In particular, the following modifications to the script ought to do it:

  1. 要将文件名称转换为UTF-8,请编辑

  1. To convert the file names to UTF-8, edit the titleFromFilename() function, near the bottom of the script, and replace its last line:

return $parts[0];

具有:

return mb_convert_encoding( $parts[0], "UTF-8", "your-encoding" );

其中your-encoding应该是所使用的字符编码输入您的文件名(或auto尝试自动检测).

where your-encoding should be the character encoding used for your file names (or auto to attempt auto-detection).

要同时转换文件的内容,请在脚本的主要代码内进行类似的更改,并替换以下行:

To also convert the contents of the files, make a similar change higher up, inside the main code of the script, replacing the line:

$text = file_get_contents( $filename );

具有:

$text = file_get_contents( $filename );
$text = mb_convert_encoding( $text, "UTF-8", "your-encoding" );

这篇关于将许多文本文件批量上传到MediaWiki的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆