创建一个简单的搜索程序 [英] Creating a simple searching program

查看：112 发布时间：2017/5/21 22:21:52 python search dictionary text-processing

本文介绍了创建一个简单的搜索程序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

决定删除并再次询问，只是更容易！请不要像人们一直在说的那样投票。

我有两个嵌套字典： -

  wordFrequency = {'位 '：{1：3,2：4,3：19,4：0}，' 红色 '：{1：0,2：0,3：15,4：0}，' 狗'：{1： 3,2：0,3：4,4：5}} 
 
 search = {1：{'bit'：1}，2：{'red'：1，'dog'：1 }，3：{'bit'：2，'red'：3}}

链接一个文件号和它们在该文件中出现的次数。第二个包含将单词与当前搜索中出现的次数相关联的搜索。

我想提取某些值，以便每次搜索可以计算标量文字中出现的次数和搜索中出现的次数除以其大小之间的乘积，然后查看哪个文件与当前搜索最相似ie（文字中的1个出现在搜索中* 1个出现） +（文字中的单词2出现）文件中的第2个出现等），然后将搜索字典返回到文件编号列表中，最相似的是第一个，最不相似的最后一个。

预期输出是一个字典：

  {1：[4,3,1,2]，2：[1， 2,4,3]}

等。

关键是搜索号码，该值是首先列出最相关的文件列表。

（这些可能不是正确的。）

这是我有的： -

  def retr ieve（）：
 results = {} 
搜索中的单词：
 numberOfAppearances = wordFrequency.get（word）.values（）
出现在numberOfAppearances中：
结果[fileNumber] = numberOfAppearances.dot（）
返回排序（results.iteritems（），key = lambda（fileNumber，appearanceances）：出现，reverse = True）
  / pre> 
 
 对不起，只是说wdir =然后是.py文件所在的目录。
 
 
  
 编辑
 
 
 
 
 整个Retrieve.py文件：
 从集合导入计数器
 
 def retrieve（）：
 
 wordFrequency = {'bit'：{1：3,2 ：4,3：19,4：0}，'红'：{1：0,2：0,3：15,4：0}，'狗'：{1：3,2：0,3：4 ，4：5}} 
 search = {1：{'bit'：1}，2：{'red'：1，'dog'：1}，3：{'bit'：2，'red '：3}} 
 
 
 results = {} 
 for search_number，search.iteritems（）中的单词：
 file_relevancy = Counter（）
 for w ^ ord，num_appearances in words.iteritems（）：
 for file_id，appear_in_file in wordFrequency.get（word，{}）。iteritems（）：
 file_relevancy [file_id] + = num_appearances * appear_in_file 
 
结果[search_number] = [file_id for（file_id，count）in file_relevancy.most_common（）] 
 
返回结果
  
我正在使用Spyder GUI / IDE for Anaconda Python 2.7，只需按下绿色播放按钮，输出就是：
 
 
  wdir ='/ Users / danny / Desktop'
 
 
  
 编辑2 
 
 
 
 
 关于大小，例如，对于搜索号3和文件1，它将是：
 
 
  sqrt（2 ^ 2 + 3 ^ 2 + 0 ^ 2）* sqrt（3 ^ 2 + 0 ^ 2 + 3 ^ 2）
解决方案
开始：
 从集合导入计数器
 def retrieve（）：
 results = {} 
 for search_number，search.iteritems（）中的单词：
 fi le_relevancy = Counter（）
 for word，num_appearances in words.iteritems（）：
 for file_id，appear_in_file in wordFrequency.get（word，{}）。iteritems（）：
 file_relevancy [file_id ] + = num_appearances * appear_in_file 
 
结果[search_number] = [file_id for（file_id，count）in file_relevancy.most_common（）] 
 
返回结果
 
 print retrieve（）
  
 
Decided to delete and ask again, was just easier! Please do not vote down as have taken on board what people have been saying. 

I have two nested dictionaries:-
wordFrequency = {'bit':{1:3,2:4,3:19,4:0},'red':{1:0,2:0,3:15,4:0},'dog':{1:3,2:0,3:4,4:5}}

search = {1:{'bit':1},2:{'red':1,'dog':1},3:{'bit':2,'red':3}}
The first dictionary links words a file number and the number of times they appear in that file. The second contains searches linking a word to the number of times it appears in the current search.

I want to extract certain values so that for each search I can calculate the scalar product between the number of times words appear in a file and number of times they appear in a search divided by their magnitudes, then see which file is most similar to the current search i.e. (word 1 appearances in search * word 1 appearances in file) + (word 2 appearances in search * word 2 appearances in file) etc. And then return a dictionary of searches to list of file numbers, most similar first, least similar last.

Expected output is a dictionary:
{1:[4,3,1,2],2:[1,2,4,3]}
etc.

The key is the search number, the value is a list of files most relevant first.

(These may not actually be right.)

This is what I have:-
def retrieve():
    results = {}
    for word in search:
        numberOfAppearances = wordFrequency.get(word).values()
        for appearances in numberOfAppearances:
            results[fileNumber] = numberOfAppearances.dot()
return sorted (results.iteritems(), key=lambda (fileNumber, appearances): appearances, reverse=True)
Sorry no it just says wdir = and then the directory the .py file is in.


Edit


The entire Retrieve.py file:
from collections import Counter

def retrieve():

    wordFrequency = {'bit':{1:3,2:4,3:19,4:0},'red':{1:0,2:0,3:15,4:0},'dog':    {1:3,2:0,3:4,4:5}}
    search = {1:{'bit':1},2:{'red':1,'dog':1},3:{'bit':2,'red':3}}


    results = {}
    for search_number, words in search.iteritems():
        file_relevancy = Counter()
        for word, num_appearances in words.iteritems():
            for file_id, appear_in_file in wordFrequency.get(word, {}).iteritems():
                file_relevancy[file_id] += num_appearances * appear_in_file

        results[search_number] = [file_id for (file_id, count) in file_relevancy.most_common()]

    return results
I am using the Spyder GUI / IDE for Anaconda Python 2.7, just press the green play button and output is:

wdir='/Users/danny/Desktop'


Edit 2


In regards to the magnitude, for example, for search number 3 and file 1 it would be:

sqrt (2^2 + 3^2 + 0^2) * sqrt (3^2 + 0^2 + 3^2)
 解决方案 
Here is a start:
from collections import Counter
def retrieve():
    results = {}
    for search_number, words in search.iteritems():
        file_relevancy = Counter()
        for word, num_appearances in words.iteritems():
            for file_id, appear_in_file in wordFrequency.get(word, {}).iteritems():
                file_relevancy[file_id] += num_appearances * appear_in_file

        results[search_number] = [file_id for (file_id, count) in file_relevancy.most_common()]

    return results

print retrieve()


                        
这篇关于创建一个简单的搜索程序的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

创建一个简单的搜索程序 [英] Creating a simple searching program

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

创建一个简单的搜索程序 [英] Creating a simple searching program

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭