制作tsearch2字典 [英] making tsearch2 dictionaries

查看:57
本文介绍了制作tsearch2字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力使自己成为tsearch2的字典,将

数字转换为他们的英文单词等价物。这似乎很有效,除了我无法弄清楚如何使我的lexize函数

返回多个lexemes。例如,我想要100和100。转换

到{one,百},而不是{" 100'}当前正在发生。


如何指定输出lexize函数这样会发生
吗?

------------------------- - (广播结束)---------------------------

提示4:不要' 'kill -9''邮政局长

I''m trying to make myself a dictionary for tsearch2 that converts
numbers to their english word equivalents. This seems to be working
great, except that I can''t figure out how to make my lexize function
return multiple lexemes. For instance, I''d like "100" to get converted
to {one,hundred}, not {"one hundred"} as is currently happening.

How do I specify the output of the lexize function so that this will
happen?
---------------------------(end of broadcast)---------------------------
TIP 4: Don''t ''kill -9'' the postmaster

推荐答案

好的,所以我实际上能够自己回答这个问题,

说话方式。看来这样做的方法就是只返回一个更大的char **数组,每个单词都有一个元素。但是我在postgres崩溃时遇到了麻烦,因为(我认为)它试图在使用所有这些元素之前独立地释放每个

元素。我已将每个元素

设置为同一个palloc''d内存

段的另一个以null结尾的块。从来没有写过C存储过程,我认为那是'b
不好的做法?


无论如何,现在这个有用了,我的下一个问题是:我可以从一个字典查找中取出

lexemes并将它们输入另一个

字典吗?我看到我可以有冗余的字典,这样如果没有找到一个它会尝试另一个,但是那不是很好的/ b
相同。


例如,en_stem字典转换百字符。进入hundr。

现在,我的字典转换为100。进入一和百,但

我希望它通过en_stem

字典过滤掉一个和一百个到达一个字典。和hundr。


我也会想到我可以通过ispell词典管道输入

并且能够处理拼写错误...... br />

在Sun,2004-02-15 15:35,Ben写道:
Okay, so I was actually able to answer this question on my own, in a
manner of speaking. It seems the way to do this is to merely return a
larger char** array, with one element for each word. But I was having
trouble with postgres crashing, because (I think) it tries to free each
element independently before using all of them. I had set each element
to a different null-terminated chunk of the same palloc''d memory
segment. Having never written C stored procs before, I take it that''s
bad practice?

Anyway, now that this is working, my next question is: can I take the
lexemes from one dictionary lookup and pipe them into another
dictionary? I see that I can have redundant dictionaries, such that if
lexemes aren''t found in one it''ll try another, but that''s not quite the
same.

For instance, the en_stem dictionary converts "hundred" into "hundr".
Right now, my dictionary converts "100" into "one" and "hundred", but
I''d like it to filter both one and hundred through the en_stem
dictionary to arrive at "one" and "hundr".

It also occurs to me I could pipe things through an ispell dictionary
and be able to handle misspellings....

On Sun, 2004-02-15 at 15:35, Ben wrote:
我正在努力使自己成为tsearch2的字典,转换<数字与他们的英文单词等价物。这似乎工作很好,除了我不知道如何使我的lexize功能
返回多个lexemes。例如,我想要100和100。转换为当前正在发生的{one,100},而不是{" 100}}。

如何指定lexize函数的输出,以便<发生了吗?
I''m trying to make myself a dictionary for tsearch2 that converts
numbers to their english word equivalents. This seems to be working
great, except that I can''t figure out how to make my lexize function
return multiple lexemes. For instance, I''d like "100" to get converted
to {one,hundred}, not {"one hundred"} as is currently happening.

How do I specify the output of the lexize function so that this will
happen?



---------------------------(广播结束) - --------------------------

提示1:订阅和取消订阅命令转到 ma ******* @ postgresql.org


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org


来自 http://www.sai.msu.su /~megera/oddmus...ch_V2_in_Brief


存储词典的表。 Dict_init字段存储函数Oid

初始化字典。 Dict_init有一个选项:来自

dict_initoption的文本值,应返回内部表示(结构)

字典。必须在
TopMemoryContext中对结构进行malloced或palloced。每个进程只调用一次Dict_init。

dict_lexize字段存储lexem lemmaem的函数Oid。

输入值:字典的结构,pionter到string和它的's'

长度。输出:指向C字符串指针数组的指针。数组中的最后一个指针

必须为NULL。返回NULL表示字典无法解析

这个单词,但返回void数组意味着字典知道输入字,

但是假设该单词是停用词。


Ben写道:
From http://www.sai.msu.su/~megera/oddmus...ch_V2_in_Brief

Table for storing dictionaries. Dict_init field store Oid of function
that initialize dictionary. Dict_init has one option: text value from
dict_initoption and should return internal representation (structure)
of dictionary. Structure must be malloced or palloced in
TopMemoryContext. Dict_init is called only one times per process.
dict_lexize field store Oid of function that lemmatize lexem.
Input values: structure of dictionary, pionter to string and it''s
length. Output: pointer to array of pointers to C-strings. Last pointer
in array must be NULL. Returns NULL means that dictionary can''t resolve
this word, but return void array means that dictionary know input word,
but suppose that word is stop-word.

Ben wrote:
我正在努力使自己成为tsearch2的字典,将
数字转换为他们的英文单词等价物。这似乎工作很好,除了我不知道如何使我的lexize功能
返回多个lexemes。例如,我想要100和100。转换为当前正在发生的{one,100},而不是{" 100}}。

如何指定lexize函数的输出,以便<发生了什么?

---------------------------(广播结束)----- ----------------------
提示4:不要杀死-9''邮政局长
I''m trying to make myself a dictionary for tsearch2 that converts
numbers to their english word equivalents. This seems to be working
great, except that I can''t figure out how to make my lexize function
return multiple lexemes. For instance, I''d like "100" to get converted
to {one,hundred}, not {"one hundred"} as is currently happening.

How do I specify the output of the lexize function so that this will
happen?
---------------------------(end of broadcast)---------------------------
TIP 4: Don''t ''kill -9'' the postmaster




-

Teodor Sigaev电子邮件: te **** @ sigaev。 ru


---------------------------(播出结束) - --------------------------

提示7:别忘了增加免费空间地图设置



--
Teodor Sigaev E-mail: te****@sigaev.ru

---------------------------(end of broadcast)---------------------------
TIP 7: don''t forget to increase your free space map settings


Ben< be *** @ silentmedia.com>写道:
Ben <be***@silentmedia.com> writes:
好的,所以我实际上能够以自己的方式回答这个问题。似乎这样做的方法是仅返回一个更大的char **数组,每个单词都有一个元素。但是我遇到了postgres崩溃的麻烦,因为(我认为)它试图在使用所有元素之前独立地释放每个
元素。我已将每个元素设置为同一个palloc''d内存段的另一个以null结尾的块。从来没有写过C存储过程,我认为这是不好的做法?
Okay, so I was actually able to answer this question on my own, in a
manner of speaking. It seems the way to do this is to merely return a
larger char** array, with one element for each word. But I was having
trouble with postgres crashing, because (I think) it tries to free each
element independently before using all of them. I had set each element
to a different null-terminated chunk of the same palloc''d memory
segment. Having never written C stored procs before, I take it that''s
bad practice?




鉴于Teodor的回应,我认为问题是可能你是在一个过于短暂的背景下,因为你是b $ b palloc''。但无论问题是什么,

如果用--enable-cassert构建,你会缩短它的速度。

我不建议尝试没有它就调试C函数。


问候,tom lane


--------------- ------------(广播结束)---------------------------

提示5:您是否检查了我们广泛的常见问题解答?

http://www.postgresql.org/docs/faqs/FAQ.html



Given Teodor''s response, I think the issue is probably that you were
palloc''ing in too short-lived a context. But whatever the problem is,
you''ll narrow it down a lot faster if you build with --enable-cassert.
I wouldn''t ever recommend trying to debug C functions without that.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html


这篇关于制作tsearch2字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆