将每列的内容与所有其他列的内容进行比较并呈现匹配计数矩阵 [英] Compare each column's contents with all other columns' contents and present matrix of match counts

查看:63
本文介绍了将每列的内容与所有其他列的内容进行比较并呈现匹配计数矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

鉴于此表:

我想推导出这张表:

...有点像地图册中的里程表.

...sort of like a mileage chart in a map book.

我正在尝试创建每列中的单词与所有其他列的单词的交叉表比较,以显示它们之间有多少匹配.

I'm trying to create a cross-table comparison of the words in each of the columns, against all of the other columns' words, to show how many matches there are between them.

例如,将第 1 列与第 2 列进行比较可能会产生 4 个匹配项.黄色、粗体轮廓的单元格是匹配项.

For instance, comparing Column 1 against Column2 might yield 4 matches. The yellow, bold outlined cells are the matches.

这是我计算它们的方式:

And here's how I count them:

我认为使用 Power Query 可能有一种简单"的方法来实现这一点.有吗?

I'm thinking there might be an 'easy' way to accomplish this using Power Query. Is there?

(哦...顺便说一下...我正在寻找的解决方案不应该期望输入列的静态数量:即,它应该容纳更多列或更少列用于输入比较设置.)

(Oh...and by the way...the solution I'm looking for should not expect a static number of input columns: i.e., it should accommodate for more columns or less columns to be used in the input comparison set.)

谢谢.

推荐答案

不,没有简单的方法,但可以做到.但是,我得到了不同的结果.我对你的逻辑的解释是:对于每个列组合,1 列中每个常用词的出现次数必须乘以另一列中的出现次数.这些是我的结果:

No, there is no easy way, but it can be done. However, I get different results. My interpretation of your logic is: for each column combination, the number of occurrences of each common word in 1 column must be multiplied with the number of occurrences in the other column. These are my results:

这是我的查询代码:

let
    Source = Table1,
    ColumnNames = Table.ColumnNames(Source),
    Tabled = Table.FromColumns({ColumnNames}, type table[Columns = text]),
    AddedColumns2 = Table.AddColumn(Tabled, "Columns2", each ColumnNames, type {text}),
    ExpandedColumns2 = Table.ExpandListColumn(AddedColumns2, "Columns2"),
    CommonWords = 
        Table.AddColumn(ExpandedColumns2, 
                        "DistinctIntersect", 
                        each if [Columns] = [Columns2]
                           then {} 
                           else List.Distinct(List.Intersect({Table.Column(Source,[Columns]),
                                                              Table.Column(Source,[Columns2])}))),
    AddedCount = 
        Table.AddColumn(CommonWords,
                        "Count", 
                        (This) => List.Sum({0}&List.Transform(This[DistinctIntersect],
                                                   each List.Count(List.PositionOf(Table.Column(Source,This[Columns]),_,2)) *
                                                        List.Count(List.PositionOf(Table.Column(Source,This[Columns2]),_,2)))),
                       Int64.Type),
    RemovedColumns = Table.RemoveColumns(AddedCount,{"DistinctIntersect"}),
    PivotedColumn = Table.Pivot(RemovedColumns, List.Distinct(RemovedColumns[Columns2]), "Columns2", "Count")
in
    PivotedColumn

这篇关于将每列的内容与所有其他列的内容进行比较并呈现匹配计数矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆