对齐单词的最佳方法? [英] best way to align words?
问题描述
你好,
i希望写一段代码来帮助我调整一些序列
的单词并建议我订购的常用子词他们
s0 ="这是我希望拥有的一个例子.split()
s1 =另一个例子否则我想要.split()
s2 =''这是另一个例子但有些东西;现在我会
仍然喜欢''。split()
....
alist =(s0,s1,s2)
结果应该是:('''''''''''''''''''''would'',''like'',''to to to '',''''
但我不知道该如何开始,可能会给你一个有用的建议吗?
?
a麻烦我有这个问题,如果我有很多不同的字符串,我的结果往往是没有什么,而我仍然希望有一个,或者可能,
全部最好的比赛。
最好。
Hello,
i would like to write a piece of code to help me to align some sequence
of words and suggest me the ordered common subwords of them
s0 = "this is an example of a thing i would like to have".split()
s1 = "another example of something else i would like to have".split()
s2 = ''and this is another " example " but of something ; now i would
still like to have''.split()
....
alist = (s0, s1, s2)
result should be : (''example'', ''of'', ''i'', ''would'', ''like'', ''to'', ''have''
but i do not know how should i start, may be have you a helpful
suggestion?
a trouble i have if when having many different strings my results tend
to be nothing while i still would like to have one of the, or maybe,
all the best matches.
best.
推荐答案
Robert R. schrieb:
Robert R. schrieb:
你好,
i想写一段代码来帮助我调整一些序列
的单词,并建议我订购它们的常用子词
s0 ="这是我希望拥有的一个例子.split()
s1 =&quo t;另一个我希望拥有的东西的例子.split()
s2 =''这是另一个例子但有些东西;现在我会
仍然喜欢''.split()
...
alist =(s0,s1,s2)<结果应该是:(''''''''''',''''',''会'',''喜欢'','''' ',''''
但是我不知道该怎么开始,可能会给你一个有用的
建议吗?
a麻烦我有这个问题,如果我有很多不同的字符串,我的结果往往是没有什么,而我仍然希望有一个,或者可能,
所有最好的比赛。
最好。
Hello,
i would like to write a piece of code to help me to align some sequence
of words and suggest me the ordered common subwords of them
s0 = "this is an example of a thing i would like to have".split()
s1 = "another example of something else i would like to have".split()
s2 = ''and this is another " example " but of something ; now i would
still like to have''.split()
...
alist = (s0, s1, s2)
result should be : (''example'', ''of'', ''i'', ''would'', ''like'', ''to'', ''have''
but i do not know how should i start, may be have you a helpful
suggestion?
a trouble i have if when having many different strings my results tend
to be nothing while i still would like to have one of the, or maybe,
all the best matches.
best.
据我所知,你想要的话,就是这三个列表
有共同点,对吗?
s0 ="这是我希望拥有的一个例子.split()
s1 ="我希望拥有的其他东西的另一个例子.split()
s2 =''这是另一个例子但是有些东西;现在我会
仍然喜欢''.split()
def findCommons(s0,s1,s2):
res = [ ]
s0中的单词:
如果s1中的单词和s2中的单词:
res.append(单词)
返回res
As far as I can see, you want to have the words, that all three lists
have in common, right?
s0 = "this is an example of a thing i would like to have".split()
s1 = "another example of something else i would like to have".split()
s2 = ''and this is another " example " but of something ; now i would
still like to have''.split()
def findCommons(s0, s1, s2):
res = []
for word in s0:
if word in s1 and word in s2:
res.append(word)
return res
>>> print findCommons(s0,s1, s2)
>>>print findCommons(s0,s1,s2)
['''''''''''''''''''''''''''' ,''喜欢'','''','''''
[''example'', ''of'', ''i'', ''would'', ''like'', ''to'', ''have'']
Robert R.写道:
Robert R. wrote:
您好,
i想写一段代码来帮助我调整一些序列
的单词并建议我订购的常用它们的子词
a麻烦我有这样的情况,如果我有很多不同的字符串,我的结果往往是没有什么,而我仍然希望有一个,或者也许,
所有最佳匹配。
Hello,
i would like to write a piece of code to help me to align some sequence
of words and suggest me the ordered common subwords of them
a trouble i have if when having many different strings my results tend
to be nothing while i still would like to have one of the, or maybe,
all the best matches.
" align"?
无论如何,为了找到最常用的词,你最好是
计算每个单词出现的次数:
lst = [" foo bar baz"," qux foo foo kaka",one foo and kaka
times qux"]
for line in lst:
for line in line.split():
count [word] = count.get(word,0)+ 1
现在你选择计数最多的那个:
for(word,n)in sorted(d.items(),key = lambda x:x [1],
reverse = True):
print word,' '出现'',n,''次''
未经测试。如果你想计算一个单词
出现的行数(而不是它出现在
all的次数),在count之前添加一个额外的条件[ word] = ...
"align"?
Anyway, for finding the commonest words, you''ll be best off
counting how many times each word appears:
lst = ["foo bar baz", "qux foo foo kaka", "one foo and kaka
times qux"]
for line in lst:
for word in line.split():
count[word] = count.get(word,0) + 1
Now you go for the ones with the highest count:
for (word, n) in sorted(d.items(), key = lambda x: x[1],
reverse = True):
print word, ''appears'', n, ''times''
Untested. If you want to count the number of lines a word
appears in (as opposed to the number of times it appears at
all), add an extra condition before count[word] = ...
i想写一段代码来帮助我对齐一些序列
的单词并建议我订购它们的公共子词
i would like to write a piece of code to help me to align some sequence
of words and suggest me the ordered common subwords of them
我不确定你想要什么,但万一你知道如何
快速排序和Djikstra算法工作:)并希望了解更多。
有许多算法,在文本算法上发现
大学课程。第一个没有直接解决你的问题 -
编辑距离 (Levenshtein距离)
http://en.wikipedia.org/wiki/ Levenshtein_distance
我在这里提到它只是因为它很简单并且显示了基本的想法
动态编程
http://en.wikipedia.org/wiki/Dynamic_programming
如果向下滚动,您将看到最长公共子序列问题用于2个序列的Python实现
。如果你不明白它是怎么回事
的工作原理只是看看编辑距离。想法并且看到它正好与改变规则的
相同的算法。
Oleg
Im not sure what you want, but in case you are guy who knows how
quicksort and Djikstra algorithms work :) and wants to find out more.
There are many algorithms out there, discovered on "Text algorithms"
univesity course. The first one does not directly solve your problem -
"edit distance" (Levenshtein distance)
http://en.wikipedia.org/wiki/Levenshtein_distance
I mention it here only because it is simple and shows basic idea of
Dynamic Programming
http://en.wikipedia.org/wiki/Dynamic_programming
If you scroll down you''ll see "Longest common subsequence problem" with
implementation in Python for 2 sequences. If you dont understand how it
works just look into "edit distance" idea and see it is exactly the
same algorithm with changed rules.
Oleg
这篇关于对齐单词的最佳方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!