向量空间模型-查询向量[0,0.707,0.707] [英] Vector Space Model - query vector [0, 0.707, 0.707] calculated

查看:104
本文介绍了向量空间模型-查询向量[0,0.707,0.707]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读《信息检索简介》(Christopher Manning)一书,当它引入了查询嫉妒八卦"时,我停留在第6章,该查询指出关联的向量单位为[0, 0.707,0.707]( https://nlp .stanford.edu/IR-book/html/htmledition/queries-as-vectors-1.html ),请考虑影响,嫉妒和八卦"一词. 我尝试通过假设以下条件计算tf idf来计算它: -对于嫉妒和八卦,Tf等于1 -如果我们将其计算为log(N/df)且N = 1(我只有1个查询,这是我的文档),则Idf始终等于0,嫉妒和八卦中df = 1 => log(1)= 0 由于idf为0,因此证明tf idf为0. 因此,我决定用原始tf除以欧几里得长度来计算查询向量的每个权重.在这种情况下,欧几里得长度为sqrt(1 + 1)= 1. 我无法获得用来确定[0,0.707,0.707]是查询向量的公式. 有人能帮我吗?

I'm reading the book "Introduction to Information Retrieval "(Christopher Manning) and I'm stuck on the chapter 6 when it introduces the query "jealous gossip" for which it indicated that the vector unit associated is [0, 0.707, 0.707] ( https://nlp.stanford.edu/IR-book/html/htmledition/queries-as-vectors-1.html ) considering the terms affect, jealous and gossip. I tried to calculate it by computing the tfidf assuming that: - Tf is equal to 1 for jealous and gossip - Idf is always equal to 0 if we calculate it as log(N/df) with N=1(I have only 1 query and it is my document), df=1 for jealous and gossip => log(1)=0 Since the idf is 0, it turns out that the tfidf is 0. So I decided to compute every weight of the query vector with the raw tf divided by the euclidean length. In this case the Euclidean length is sqrt(1+1)=1. I can't obtain the formula by which it decided that [0, 0.707, 0.707] is the query vector. Can someone help me?

推荐答案

我还没有解决问题,但是我认为问题可能是sqrt(1+1)sqrt(2),因此当您进行归一化时,每个1变成1/sqrt(2) = 0.707.

I haven't worked through the problem, but I think the issue might be that sqrt(1+1) is sqrt(2), so when you normalize, each of the 1s become 1/sqrt(2) = 0.707.

这篇关于向量空间模型-查询向量[0,0.707,0.707]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆