Python多字索引 [英] Python Multiword Index

查看:76
本文介绍了Python多字索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

index = {'Michael': [['mj.com',1], ['Nine.com',9],['i.com', 34]], / 
         'Jackson': [['One.com',4],['mj.com', 2],['Nine.com', 10], ['i.com', 45]], /
         'Thriller' : [['Seven.com', 7], ['Ten.com',10], ['One.com', 5], ['mj.com',3]}

# In this dictionary (index), for eg: 'KEYWORD': 
# [['THE LINK in which KEYWORD is present,'POSITION
# of KEYWORD in the page specified by link']]

例如:Michael出现在MJ.com,NINE.com和i.com的相应页面的第1、9、34位.

eg: Michael is present in MJ.com, NINE.com, and i.com at positions 1, 9, 34 of respective pages.

请帮我一个以indexKEYWORDS作为输入的python过程.

Please help me with a python procedure which takes index and KEYWORDS as input.

当我输入'MICHAEL'时.结果应该是:

When i enter 'MICHAEL'. The result should be:

>>['mj.com', 'nine.com', 'i.com']

当我输入'MICHAEL JACKSON'.时,结果应为:

When I enter 'MICHAEL JACKSON'. The result should be :

>>['mj.com', 'Nine.com']

为'Michael'和'Jackson',连续出现在'mj.com''nine.com'处,即位置(1,2)和& c. (9,10).即使包含两个关键字,但结果也不显示'i.com',但它们不会连续放置.

as 'Michael' and 'Jackson' are present at 'mj.com' and 'nine.com' consecutively i.e. in positions (1,2) & (9,10) respectively. The result should not show 'i.com' even though it contains both KEYWORDS but they are not placed consecutively.

当我输入'MICHAEL JACKSON THRILLER',时,结果应为

['mj.com']

因为3个单词'MICHAEL', 'JACKSON', 'THRILLER'被连续放置在'mj.com'中,即分别位于位置(1、2、3).

as the 3 words 'MICHAEL', 'JACKSON', 'THRILLER' are placed consecutively in 'mj.com' ie positions (1, 2, 3) respectively.

如果我输入'THRILLER JACKSON''THRILLER FEDERER',,则结果应为NONE.

If I enter 'THRILLER JACKSON' or 'THRILLER FEDERER', the result should be NONE.

推荐答案

作为附带说明,CS的Udacity简介恰好涵盖了这个问题.这对正确的输入做出了许多假设(本质上说,它永远不会遇到任何不正确的输入).

As a side note, Udacity Intro to CS covers precisely this question. This makes a number of assumptions about proper inputs (essentially that it never encounters any incorrect ones).

def lookup(index,KEYWORDS):
    kw = KEYWORDS.split()
    if len(kw) == 1:
        return [site[0] for site in index[kw[0]]]
    else:
        positions = {}
        result = []
        kw = KEYWORDS.split()
        for kword in kw:
            for site in index[kword]:
                positions[(kword,site[0])]=site[1]
        for i in range(0,len(kw)-1):
            cur_urls = [site[0] for site in index[kw[i]]]
            next_urls = [site[0] for site in index[kw[i+1]]]
            if i == 0:
                result = cur_urls
            for url in cur_urls:
                if url in next_urls:
                    if not (positions[kw[i+1],url]-positions[kw[i],url]) == 1:
                        result.remove(url)
                else:
                    if url in result:
                        result.remove(url)
        return result

这篇关于Python多字索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆