对齐单词的最佳方法? [英] best way to align words?

查看:161
本文介绍了对齐单词的最佳方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好,


i希望写一段代码来帮助我调整一些序列

的单词并建议我订购的常用子词他们


s0 ="这是我希望拥有的一个例子.split()

s1 =另一个例子否则我想要.split()

s2 =''这是另一个例子但有些东西;现在我会

仍然喜欢''。split()

....

alist =(s0,s1,s2)


结果应该是:('''''''''''''''''''''would'',''like'',''to to to '',''''


但我不知道该如何开始,可能会给你一个有用的建议吗?


a麻烦我有这个问题,如果我有很多不同的字符串,我的结果往往是没有什么,而我仍然希望有一个,或者可能,

全部最好的比赛。


最好。

Hello,

i would like to write a piece of code to help me to align some sequence
of words and suggest me the ordered common subwords of them

s0 = "this is an example of a thing i would like to have".split()
s1 = "another example of something else i would like to have".split()
s2 = ''and this is another " example " but of something ; now i would
still like to have''.split()
....
alist = (s0, s1, s2)

result should be : (''example'', ''of'', ''i'', ''would'', ''like'', ''to'', ''have''

but i do not know how should i start, may be have you a helpful
suggestion?
a trouble i have if when having many different strings my results tend
to be nothing while i still would like to have one of the, or maybe,
all the best matches.

best.

推荐答案

Robert R. schrieb:
Robert R. schrieb:

你好,


i想写一段代码来帮助我调整一些序列

的单词,并建议我订购它们的常用子词


s0 ="这是我希望拥有的一个例子.split()

s1 =&quo t;另一个我希望拥有的东西的例子.split()

s2 =''这是另一个例子但有些东西;现在我会

仍然喜欢''.split()

...

alist =(s0,s1,s2)<结果应该是:(''''''''''',''''',''会'',''喜欢'','''' ',''''


但是我不知道该怎么开始,可能会给你一个有用的

建议吗?

a麻烦我有这个问题,如果我有很多不同的字符串,我的结果往往是没有什么,而我仍然希望有一个,或者可能,

所有最好的比赛。


最好。
Hello,

i would like to write a piece of code to help me to align some sequence
of words and suggest me the ordered common subwords of them

s0 = "this is an example of a thing i would like to have".split()
s1 = "another example of something else i would like to have".split()
s2 = ''and this is another " example " but of something ; now i would
still like to have''.split()
...
alist = (s0, s1, s2)

result should be : (''example'', ''of'', ''i'', ''would'', ''like'', ''to'', ''have''

but i do not know how should i start, may be have you a helpful
suggestion?
a trouble i have if when having many different strings my results tend
to be nothing while i still would like to have one of the, or maybe,
all the best matches.

best.



据我所知,你想要的话,就是这三个列表

有共同点,对吗?


s0 ="这是我希望拥有的一个例子.split()

s1 ="我希望拥有的其他东西的另一个例子.split()

s2 =''这是另一个例子但是有些东西;现在我会
仍然喜欢''.split()


def findCommons(s0,s1,s2):

res = [ ]

s0中的单词:

如果s1中的单词和s2中的单词:

res.append(单词)

返回res

As far as I can see, you want to have the words, that all three lists
have in common, right?

s0 = "this is an example of a thing i would like to have".split()
s1 = "another example of something else i would like to have".split()
s2 = ''and this is another " example " but of something ; now i would
still like to have''.split()

def findCommons(s0, s1, s2):
res = []
for word in s0:
if word in s1 and word in s2:
res.append(word)
return res


>>> print findCommons(s0,s1, s2)
>>>print findCommons(s0,s1,s2)



['''''''''''''''''''''''''''' ,''喜欢'','''','''''

[''example'', ''of'', ''i'', ''would'', ''like'', ''to'', ''have'']


Robert R.写道:
Robert R. wrote:

您好,


i想写一段代码来帮助我调整一些序列

的单词并建议我订购的常用它们的子词


a麻烦我有这样的情况,如果我有很多不同的字符串,我的结果往往是没有什么,而我仍然希望有一个,或者也许,

所有最佳匹配。
Hello,

i would like to write a piece of code to help me to align some sequence
of words and suggest me the ordered common subwords of them

a trouble i have if when having many different strings my results tend
to be nothing while i still would like to have one of the, or maybe,
all the best matches.



" align"?

无论如何,为了找到最常用的词,你最好是

计算每个单词出现的次数:


lst = [" foo bar baz"," qux foo foo kaka",one foo and kaka

times qux"]


for line in lst:

for line in line.split():

count [word] = count.get(word,0)+ 1


现在你选择计数最多的那个:


for(word,n)in sorted(d.items(),key = lambda x:x [1],

reverse = True):

print word,' '出现'',n,''次''


未经测试。如果你想计算一个单词

出现的行数(而不是它出现在

all的次数),在count之前添加一个额外的条件[ word] = ...

"align"?
Anyway, for finding the commonest words, you''ll be best off
counting how many times each word appears:

lst = ["foo bar baz", "qux foo foo kaka", "one foo and kaka
times qux"]

for line in lst:
for word in line.split():
count[word] = count.get(word,0) + 1

Now you go for the ones with the highest count:

for (word, n) in sorted(d.items(), key = lambda x: x[1],
reverse = True):
print word, ''appears'', n, ''times''

Untested. If you want to count the number of lines a word
appears in (as opposed to the number of times it appears at
all), add an extra condition before count[word] = ...



i想写一段代码来帮助我对齐一些序列

的单词并建议我订购它们的公共子词
i would like to write a piece of code to help me to align some sequence
of words and suggest me the ordered common subwords of them



我不确定你想要什么,但万一你知道如何

快速排序和Djikstra算法工作:)并希望了解更多。


有许多算法,在文本算法上发现

大学课程。第一个没有直接解决你的问题 -

编辑距离 (Levenshtein距离)
http://en.wikipedia.org/wiki/ Levenshtein_distance

我在这里提到它只是因为它很简单并且显示了基本的想法

动态编程
http://en.wikipedia.org/wiki/Dynamic_programming


如果向下滚动,您将看到最长公共子序列问题用于2个序列的Python实现
。如果你不明白它是怎么回事

的工作原理只是看看编辑距离。想法并且看到它正好与改变规则的

相同的算法。


Oleg

Im not sure what you want, but in case you are guy who knows how
quicksort and Djikstra algorithms work :) and wants to find out more.

There are many algorithms out there, discovered on "Text algorithms"
univesity course. The first one does not directly solve your problem -
"edit distance" (Levenshtein distance)
http://en.wikipedia.org/wiki/Levenshtein_distance
I mention it here only because it is simple and shows basic idea of
Dynamic Programming
http://en.wikipedia.org/wiki/Dynamic_programming

If you scroll down you''ll see "Longest common subsequence problem" with
implementation in Python for 2 sequences. If you dont understand how it
works just look into "edit distance" idea and see it is exactly the
same algorithm with changed rules.

Oleg


这篇关于对齐单词的最佳方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆