对齐单词的最佳方法？ [英] best way to align words?

查看：161 发布时间：2019/6/6 16:03:20 python

本文介绍了对齐单词的最佳方法？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

你好，

i希望写一段代码来帮助我调整一些序列

的单词并建议我订购的常用子词他们

s0 ="这是我希望拥有的一个例子.split（）

s1 =另一个例子否则我想要.split（）

s2 =''这是另一个例子但有些东西;现在我会

仍然喜欢''。split（）

....

alist =（s0，s1，s2）

结果应该是:('''''''''''''''''''''would''，''like''，''to to to ''，''''

但我不知道该如何开始，可能会给你一个有用的建议吗？

？
a麻烦我有这个问题，如果我有很多不同的字符串，我的结果往往是没有什么，而我仍然希望有一个，或者可能，

全部最好的比赛。

最好。

Hello,

i would like to write a piece of code to help me to align some sequence
of words and suggest me the ordered common subwords of them

s0 = "this is an example of a thing i would like to have".split()
s1 = "another example of something else i would like to have".split()
s2 = ''and this is another " example " but of something ; now i would
still like to have''.split()
....
alist = (s0, s1, s2)

result should be : (''example'', ''of'', ''i'', ''would'', ''like'', ''to'', ''have''

but i do not know how should i start, may be have you a helpful
suggestion?
a trouble i have if when having many different strings my results tend
to be nothing while i still would like to have one of the, or maybe,
all the best matches.

best.

推荐答案

Robert R. schrieb：

Robert R. schrieb:

你好，

i想写一段代码来帮助我调整一些序列

的单词，并建议我订购它们的常用子词

s0 ="这是我希望拥有的一个例子.split（）

s1 =&quo t;另一个我希望拥有的东西的例子.split（）

s2 =''这是另一个例子但有些东西;现在我会

仍然喜欢''.split（）

...

alist =（s0，s1，s2）<结果应该是:('''''''''''，'''''，''会''，''喜欢''，'''' '，''''

但是我不知道该怎么开始，可能会给你一个有用的

建议吗？

a麻烦我有这个问题，如果我有很多不同的字符串，我的结果往往是没有什么，而我仍然希望有一个，或者可能，

所有最好的比赛。

最好。

Hello,

i would like to write a piece of code to help me to align some sequence
of words and suggest me the ordered common subwords of them

s0 = "this is an example of a thing i would like to have".split()
s1 = "another example of something else i would like to have".split()
s2 = ''and this is another " example " but of something ; now i would
still like to have''.split()
...
alist = (s0, s1, s2)

result should be : (''example'', ''of'', ''i'', ''would'', ''like'', ''to'', ''have''

but i do not know how should i start, may be have you a helpful
suggestion?
a trouble i have if when having many different strings my results tend
to be nothing while i still would like to have one of the, or maybe,
all the best matches.

best.

据我所知，你想要的话，就是这三个列表

有共同点，对吗？

s0 ="这是我希望拥有的一个例子.split（）

s1 ="我希望拥有的其他东西的另一个例子.split（）

s2 =''这是另一个例子但是有些东西;现在我会
仍然喜欢''.split（）

def findCommons（s0，s1，s2）：

res = [ ]

s0中的单词：

如果s1中的单词和s2中的单词：

res.append（单词）

返回res

As far as I can see, you want to have the words, that all three lists
have in common, right?

s0 = "this is an example of a thing i would like to have".split()
s1 = "another example of something else i would like to have".split()
s2 = ''and this is another " example " but of something ; now i would
still like to have''.split()

def findCommons(s0, s1, s2):
res = []
for word in s0:
if word in s1 and word in s2:
res.append(word)
return res

>>> print findCommons（s0，s1， s2）

>>>print findCommons(s0,s1,s2)

['''''''''''''''''''''''''''' ，''喜欢''，''''，'''''

[''example'', ''of'', ''i'', ''would'', ''like'', ''to'', ''have'']

Robert R.写道：

Robert R. wrote:

您好，

i想写一段代码来帮助我调整一些序列

的单词并建议我订购的常用它们的子词

a麻烦我有这样的情况，如果我有很多不同的字符串，我的结果往往是没有什么，而我仍然希望有一个，或者也许，

所有最佳匹配。

Hello,

i would like to write a piece of code to help me to align some sequence
of words and suggest me the ordered common subwords of them

a trouble i have if when having many different strings my results tend
to be nothing while i still would like to have one of the, or maybe,
all the best matches.

" align"？

无论如何，为了找到最常用的词，你最好是

计算每个单词出现的次数：

lst = [" foo bar baz"，" qux foo foo kaka"，one foo and kaka

times qux"]

for line in lst：

for line in line.split（）：

count [word] = count.get（word，0）+ 1

现在你选择计数最多的那个：

for（word，n）in sorted（d.items（），key = lambda x：x [1]，

reverse = True）：

print word，' '出现''，n，''次''

未经测试。如果你想计算一个单词

出现的行数（而不是它出现在

all的次数），在count之前添加一个额外的条件[ word] = ...

"align"?
Anyway, for finding the commonest words, you''ll be best off
counting how many times each word appears:

lst = ["foo bar baz", "qux foo foo kaka", "one foo and kaka
times qux"]

for line in lst:
for word in line.split():
count[word] = count.get(word,0) + 1

Now you go for the ones with the highest count:

for (word, n) in sorted(d.items(), key = lambda x: x[1],
reverse = True):
print word, ''appears'', n, ''times''

Untested. If you want to count the number of lines a word
appears in (as opposed to the number of times it appears at
all), add an extra condition before count[word] = ...

i想写一段代码来帮助我对齐一些序列

的单词并建议我订购它们的公共子词

i would like to write a piece of code to help me to align some sequence
of words and suggest me the ordered common subwords of them

我不确定你想要什么，但万一你知道如何

快速排序和Djikstra算法工作:)并希望了解更多。

有许多算法，在文本算法上发现

大学课程。第一个没有直接解决你的问题 -

编辑距离（Levenshtein距离）
http://en.wikipedia.org/wiki/ Levenshtein_distance

我在这里提到它只是因为它很简单并且显示了基本的想法

动态编程
http://en.wikipedia.org/wiki/Dynamic_programming

如果向下滚动，您将看到最长公共子序列问题用于2个序列的Python实现
。如果你不明白它是怎么回事

的工作原理只是看看编辑距离。想法并且看到它正好与改变规则的

相同的算法。

Oleg

Im not sure what you want, but in case you are guy who knows how
quicksort and Djikstra algorithms work :) and wants to find out more.

There are many algorithms out there, discovered on "Text algorithms"
univesity course. The first one does not directly solve your problem -
"edit distance" (Levenshtein distance)
http://en.wikipedia.org/wiki/Levenshtein_distance
I mention it here only because it is simple and shows basic idea of
Dynamic Programming
http://en.wikipedia.org/wiki/Dynamic_programming

If you scroll down you''ll see "Longest common subsequence problem" with
implementation in Python for 2 sequences. If you dont understand how it
works just look into "edit distance" idea and see it is exactly the
same algorithm with changed rules.

Oleg

这篇关于对齐单词的最佳方法？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

对齐单词的最佳方法？ [英] best way to align words?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

对齐单词的最佳方法？ [英] best way to align words?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭