Python中的反向索引没有返回所需的结果 [英] Inverted Index in Python not returning desired results

查看:202
本文介绍了Python中的反向索引没有返回所需的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在python中为反向索引返回正确的结果时遇到了麻烦。我正在尝试在变量'strlist'中加载一个字符串列表,然后使用我的Inverse索引循环遍历字符串以返回它出现的单词+。以下是我到目前为止所做的事情:

I'm having trouble returning proper results for an inverted index in python. I'm trying to load a list of strings in the variable 'strlist' and then with my Inverse index looping over the strings to return the word + where it occurs. Here is what I have going so far:

def inverseIndex(strlist):
  d={}
  for x in range(len(strlist)):
    for y in strlist[x].split():
      for index, word in set(enumerate([y])):
        if word in d:
          d=d.update(index)
        else:
          d._setitem_(index,word)
        break
      break
    break
  return d

现在我运行inverseIndex(strlist)

Now when i run inverseIndex(strlist)

它返回的全部是 {0:'This'} 我想要的是一个字典映射中的所有单词strlist'到集合 d

all it returns is {0:'This'} where what I want is a dictionary mapping all the words in 'strlist' to the set d.

我的初始方法是否错误?我在if / else中绊倒了吗?非常感谢任何和所有的帮助。指出我正确的方向。

Is my initial approach wrong? am i tripping up in the if/else? Any and all help is greatly appreciated. to point me in the right direction.

推荐答案

根据你所说的,我想你想要得到一些这样的数据:

Based on what you're saying, I think you're trying to get some data like this:

input = ["hello world", "foo bar", "red cat"]
data_wanted = {
    "foo" : 1,
    "hello" : 0,
    "cat" : 2,
    "world" : 0,
    "red" : 2
    "bar" : 1
}

所以你应该是什么正在做的是将单词添加为字典的键,并使它们的值成为它们所在的 strlist 中子字符串的索引。 / p>

So what you should be doing is adding the words as keys to a dictionary, and have their values be the index of the substring in strlist in which they are located.

def locateWords(strlist):
d = {}
for i, substr in enumerate(strlist):   # gives you the index and the item itself
    for word in substr.split()
        d[word] = i
return d

如果单词出现在 strlist 中的多个字符串中,则应将代码更改为以下内容: / p>

If the word occurs in more than one string in strlist, you should change the code to the following:

def locateWords(strlist):
d = {}
for i, substr in enumerate(strlist):
    for word in substr.split()
        if word not in d:
            d[word] = [i]
        else:
            d[word].append(i)
return d

这会将值更改为列表,其中包含 strlist 包含该单词。

This changes the values to lists, which contain the indices of the substrings in strlist which contain that word.


  1. {} 不是一个集合,它是一个字典。

  2. break 强制循环立即终止 - 你不想提前结束循环,因为你仍然需要处理数据。

  3. d.update(index)会给你一个 TypeError:'int'对象不可迭代。此方法实际上采用可迭代对象并使用它更新字典。通常你会使用一个元组列表: [(foo,1),(hello,0)] 。它只是将数据添加到词典中。

  4. 您通常不想使用 d .__ setitem __ (无论如何你输错了) )。您只需使用 d [key] = value

  5. 您可以使用for each样式循环进行迭代,例如我上面的代码显示。在范围内循环意味着您正在循环索引。 (不完全是一个问题,但是如果你不小心正确使用索引,它可能会导致额外的错误。)

  1. {} is not a set, it's a dictionary.
  2. break forces a loop to terminate immediately - you didn't want to end the loop early because you still had data to process.
  3. d.update(index) will give you a TypeError: 'int' object is not iterable. This method actually takes an iterable object and updates the dictionary with it. Normally you would use a list of tuples for this: [("foo",1), ("hello",0)]. It just adds the data to the dictionary.
  4. You don't normally want to use d.__setitem__ (which you typed wrong anyway). You'd just use d[key] = value.
  5. You can iterate using a "for each" style loop instead, like my code above shows. Looping over the range means you are looping over the indices. (Not exactly a problem, but it could lead to extra bugs if you're not careful to use the indices properly).

它看起来你是来自另一种编程语言,其中大括号表示集合,并且有一个结束控制块的关键字(如 if,fi )。首次启动时很容易混淆语法 - 但是如果您在运行代码时遇到问题,请查看您获得的例外情况并在网上搜索它们。

It looks like you are coming from another programming language in which braces indicate sets and there is a keyword which ends control blocks (like if, fi). It's easy to confuse syntax when you're first starting - but if you run into trouble running the code, look at the exceptions you get and search them on the web!

PS 我不确定你为什么想要一套 - 如果有重复,你可能想知道他们所有的位置,而不仅仅是第一个或最后一个或两者之间的任何位置。只需我0.02美元。

P.S. I'm not sure why you wanted a set - if there are duplicates, you probably want to know all of their locations, not just the first or the last one or anything in between. Just my $0.02.

这篇关于Python中的反向索引没有返回所需的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆