在 DAX(不是 powerquery)中,根据列删除重复项 [英] In DAX (not powerquery) drop duplicates based on column

查看:106
本文介绍了在 DAX(不是 powerquery)中,根据列删除重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的 PowerBI 桌面中,我的表是根据其他表计算得出的,其结构如下:

In my PowerBI desktop, I have table that is calculated from over other tables with a structure like this:

输入表:

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>Firstname</th>
      <th>Email</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Scott</td>
      <td>ABC@XYZ.com</td>
    </tr>
    <tr>
      <td>Bob</td>
      <td>ABC@XYZ.com</td>
    </tr>
    <tr>
      <td>Ted</td>
      <td>ABC@XYZ.com</td>
    </tr>
    <tr>
      <td>Scott</td>
      <td>EDF@XYZ.com</td>
    </tr>
    <tr>
      <td>Scott</td>
      <td>LMN@QRS.com</td>
    </tr>
    <tr>
      <td>Bill</td>
      <td>LMN@QRS.com</td>
    </tr>
  </tbody>
</table>

现在,我只想保留每封唯一电子邮件的第一条记录.我使用 DAX 的预期输出表是:

Now, I want to keep only the first record for each unique email. My expected output table using DAX is:

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>Firstname</th>
      <th>Email</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Scott</td>
      <td>ABC@XYZ.com</td>
    </tr>
    <tr>
      <td>Scott</td>
      <td>EDF@XYZ.com</td>
    </tr>
    <tr>
      <td>Scott</td>
      <td>LMN@QRS.com</td>
    </tr>
  </tbody>
</table>

我尝试使用 RANKX 和 FILTER,但没有任何成功.

I was trying to use RANKX and FILTER, but not having any success.

推荐答案

遗憾的是,这个问题的答案是 DAX 中无法引用相对于表中其他行的行位置.唯一的选择是使用某些列值进行排序.

Sadly, the answer to this question is that there is no way in DAX to refer to the rows position relative to the other rows in the table. The only option is to use some column value for sorting purpose.

我们可以对现有的两列表做的是获取每封电子邮件的 MAX 或 MIN Firstname.所以我们可以写一个如下的计算表,其中T是输入表,T Unique是生成表.

What we could do with the existing two columns table is to get the MAX or MIN Firstname per each Email. So we can write a calculated table like follows, where T is the input table and T Unique is the generated table.

T Unique = 
ADDCOLUMNS(
    ALL( T[Email] ),
    "Firstname",
        CALCULATE(
            MAX( T[Firstname ] )
        )
)

但这并不能满足要求.

为了获得想要的结果,我们需要在输入表中添加一列,带有索引或时间戳.

To obtain the desired result we need to add a column to the input table, with an index or a timestamp.

对于这个例子,我在 Power Query 中使用以下 M 代码添加了一个索引列,它是通过引用原始表然后单击添加列自动生成的 ->索引栏按钮

For this example I added an Index column using the following M code in Power Query, that is generated automatically by referencing the original table and then clicking on Add column -> Index column button

let
    Source = T,
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 1, 1, Int64.Type)
in
    #"Added Index"

于是我得到了T Index表.

现在我们可以编写以下计算表,使用新列检索每个电子邮件的第一行

Now we can write the following calculated table that uses the new column to retrieve the first row for each Email

T Index Unique = 
ADDCOLUMNS(
    ALL( 'T Index'[Email] ),
    "Firstname",
        VAR MinIndex =
            CALCULATE(
                MIN( 'T Index'[Index] )
            )
        RETURN
            CALCULATE(
                MAX( 'T Index'[Firstname ] ),
                'T Index'[Index] = MinIndex
            )
)

生成请求的表

在真实案例场景中,添加新列的最佳位置是直接在生成输入表的代码中.

In a real case scenario, the best place to add the new column is directly into the code that generates the input table.

这篇关于在 DAX(不是 powerquery)中,根据列删除重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆