寻找一个终端命令来解析 MacOSX 字典数据文件 [英] Looking for a terminal command to parse MacOSX dictionary data file

查看:19
本文介绍了寻找一个终端命令来解析 MacOSX 字典数据文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题

MacOSX 附带存储在 /Library/Dictionaries 中的字典.我想解析它们以编程方式(通过终端、AppleScript 或 Automator)获取字典结果.字典是 MacOSX 包,并且都有一个 Contents 文件夹,其中包含一个名为 Body.data 的文件.我想将该文件解析为 UTF-8 字符串(可能是汉字双字节)并返回找到该字符串的行.

我尝试了以下方法,但没有返回任何结果:

查找.-name 'Body.data' -exec grep -li '我' {} \;

当我使用应用程序界面搜索字典时,我可以找到合适的文本.我的目标是创建一个工作流服务,将选定的中文文本翻译成存储在系统/用户词典中的对应拼音.

更新

根据接受的答案,以下内容对我有用:

使用带有以下代码的 Xcode 创建并存档名为 rdef 的命令行实用程序:

#import int main(int argc, const char * argv[]){@autoreleasepool {如果(argc <2){printf("用法:rdef <要定义的单词>");返回-1;}NSString * 搜索 =[NSString stringWithCString: argv[1] 编码: NSUTF8StringEncoding];CFStringRef 定义 =DCSCopyTextDefinition(NULL,(__bridge CFStringRef) 搜索,CFRangeMake(0, [搜索长度]));NSString * 输出 =[NSString stringWithFormat:@"<%@>的定义:%@",搜索,(__bridge NSString *)def];printf("%s", [输出 UTF8String]);}返回0;}

将以下内容添加到我的项目框架中:

执行构建,然后使用以下步骤手动部署.

部署:

右键单击存档包并选择在 Finder 中显示.然后显示包内容并深入产品文件夹并将可执行文件复制到 /local/usr/bin.现在从命令提示符我可以像这样运行该实用程序:

rdef 我|awk -F '\|''{ gsub(/^ +| +$/, "", $2);打印 $2 }'

请参阅下面接受的答案以获取扩展参考.

注意:该实用程序的 github 可以在

这里是 Apple 的字典 API 文档:https://developer.apple.com/library/mac/documentation/UserExperience/Conceptual/DictionaryServicesProgGuide/access/access.html#//apple_ref/doc/uid/TP40006152-CH5-SW1

更新:

假设您创建了一个名为 rdef 的实用程序,它返回类似于 'Definition of <I>: |wǒ |I me my',使用下面的awk命令解析拼音:

rdef "我" |awk -F ' *[|] *' '{ 打印 $2 }'

<小时>

或者,如果可以选择基于在线的解决方案,您可以尝试基于 Google 翻译的解决方案.

至少在交互式使用中,您会在输入字段下方获得拼音转录.

例如,您的示例符号被转录为Wǒ":

http://translate.google.com/?text=%E6%88%91#zh-CN/en/%E6%88%91

Problem

MacOSX comes with dictionaries stored in /Library/Dictionaries. I would like to parse them to obtain dictionary results programmatically (via Terminal, AppleScript, or Automator). The dictionaries are MacOSX packages and all have a Contents folder that contains a file called Body.data. I would like to parse that file for a UTF-8 string (maybe Chinese character double bytes) and return the lines where the string is found.

I've tried the following, which is not returning any results:

find . -name 'Body.data' -exec grep -li '我' {} \;

When I search through the dictionary using the app interface I can find the appropriate text. My objective is to create a workflow service to translate selected Chinese text into the pinyin equivalents which are stored in the system/user dictionaries.

Update

The following worked for me based on the accepted answer:

Created and Archived a command line utility called rdef using Xcode with this code:

#import <Foundation/Foundation.h>

int main(int argc, const char * argv[])
{

    @autoreleasepool {

        if(argc < 2)
        {
            printf("Usage: rdef <word to define>");

            return -1;
        }

        NSString * search =
        [NSString stringWithCString: argv[1] encoding: NSUTF8StringEncoding];

        CFStringRef def =
        DCSCopyTextDefinition(NULL,
                              (__bridge CFStringRef)search,
                              CFRangeMake(0, [search length]));

        NSString * output =
        [NSString stringWithFormat: @"Definition of <%@>: %@", search, (__bridge NSString *)def];

        printf("%s", [output UTF8String]);


    }
    return 0;
}

Added the following to my project frameworks:

Performed a Build and then deployed manually using the steps below.

To deploy:

Right-clicked the Archived package and chose Show in Finder. Then Show Package Contents and drilled down product folder and copied the executable to /local/usr/bin. Now from a command prompt I can run the utility like so:

rdef 我|awk -F '\|' '{ gsub(/^ +| +$/, "", $2); print $2 }'

Please see the accepted answer below for extended references.

NB: The github for the utility can be found at https://github.com/mingsai/rdef.git

Next I will just create a Service to call the utility from Automator against selected text.

Service Solution

To pay it forward for the folks who've helped, especially @mklement0: here is the Solution for taking the command utility and converting it to a MacOSX service that can be used to translate Chinese characters to pinyin.

Create a new Automator Service file and make sure to select output replaces selected text.

Automator Script details

PATH=/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin/: 
export PATH
LC_CTYPE=UTF-8
x=$1

for ((i=0;i<${#x};i++)); do rdef "${x:i:1}" | awk -F  '\|' 'BEGIN {ORS=" "}{ gsub(/^ | +?/, "", $2); if (length($2) > 0) print $2 ; exit}'; done

To make the Service "live" just delete the "Ask for Text" and save the service with name of your choice (e.g. Convert to Pinyin).

To use the revised service highlight any Chinese characters and right click the context menu then on the bottom under the Services menu select "Convert to Pinyin" ... (as indicated below)

Usage

Produces this output

Hope that helps anyone with this problem.

解决方案

grep operates on text files, but the Body.data files are not text files, unfortunately.

Your best bet is probably to create your own command-line utility in Xcode, as suggested here (sample code): https://discussions.apple.com/thread/2679911

Here's Apple's dictionary API documentation: https://developer.apple.com/library/mac/documentation/UserExperience/Conceptual/DictionaryServicesProgGuide/access/access.html#//apple_ref/doc/uid/TP40006152-CH5-SW1

Update:

Assuming you've created a utility named rdef that returns something like 'Definition of <我>: | wǒ | I me my', use the following awk command to parse out the pinyin:

rdef "我" | awk -F ' *[|] *' '{ print $2 }'


Alternatively, if an online-based solution is an option, you could try a Google Translate-based solution.

At least in interactive use you get a pinyin transcription below the input field.

For instance, your example symbol is transcribed as "Wǒ":

http://translate.google.com/?text=%E6%88%91#zh-CN/en/%E6%88%91

这篇关于寻找一个终端命令来解析 MacOSX 字典数据文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆