在DAX(不是powerquery)中,根据列删除重复项 [英] In DAX (not powerquery) drop duplicates based on column

查看:132
本文介绍了在DAX(不是powerquery)中,根据列删除重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在PowerBI桌面上,我的表是根据其他表计算得出的,其结构如下:

In my PowerBI desktop, I have table that is calculated from over other tables with a structure like this:

输入表:

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>Firstname</th>
      <th>Email</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Scott</td>
      <td>ABC@XYZ.com</td>
    </tr>
    <tr>
      <td>Bob</td>
      <td>ABC@XYZ.com</td>
    </tr>
    <tr>
      <td>Ted</td>
      <td>ABC@XYZ.com</td>
    </tr>
    <tr>
      <td>Scott</td>
      <td>EDF@XYZ.com</td>
    </tr>
    <tr>
      <td>Scott</td>
      <td>LMN@QRS.com</td>
    </tr>
    <tr>
      <td>Bill</td>
      <td>LMN@QRS.com</td>
    </tr>
  </tbody>
</table>

现在,我只想保留每个唯一电子邮件的第一条记录.我期望使用DAX的输出表是:

Now, I want to keep only the first record for each unique email. My expected output table using DAX is:

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>Firstname</th>
      <th>Email</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Scott</td>
      <td>ABC@XYZ.com</td>
    </tr>
    <tr>
      <td>Scott</td>
      <td>EDF@XYZ.com</td>
    </tr>
    <tr>
      <td>Scott</td>
      <td>LMN@QRS.com</td>
    </tr>
  </tbody>
</table>

我试图使用RANKX和FILTER,但没有成功.

I was trying to use RANKX and FILTER, but not having any success.

推荐答案

遗憾的是,此问题的答案是DAX中无法引用相对于表中其他行的行位置.唯一的选择是使用某些列值进行排序.

Sadly, the answer to this question is that there is no way in DAX to refer to the rows position relative to the other rows in the table. The only option is to use some column value for sorting purpose.

我们可以使用现有的两列表格来获取每封电子邮件的MAX或MIN名字.因此,我们可以编写一个如下所示的计算表,其中 T 是输入表,而 T Unique 是生成的表.

What we could do with the existing two columns table is to get the MAX or MIN Firstname per each Email. So we can write a calculated table like follows, where T is the input table and T Unique is the generated table.

T Unique = 
ADDCOLUMNS(
    ALL( T[Email] ),
    "Firstname",
        CALCULATE(
            MAX( T[Firstname ] )
        )
)

但这不能满足要求.

要获得所需的结果,我们需要在输入表中添加带有索引或时间戳的列.

To obtain the desired result we need to add a column to the input table, with an index or a timestamp.

对于此示例,我在Power Query中使用以下M代码添加了一个Index列,该列是通过引用原始表然后单击Add column->自动生成的.索引栏按钮

For this example I added an Index column using the following M code in Power Query, that is generated automatically by referencing the original table and then clicking on Add column -> Index column button

let
    Source = T,
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 1, 1, Int64.Type)
in
    #"Added Index"

所以我获得了 T索引表.

现在,我们可以编写以下计算表,该表使用新列来检索每封电子邮件的第一行

Now we can write the following calculated table that uses the new column to retrieve the first row for each Email

T Index Unique = 
ADDCOLUMNS(
    ALL( 'T Index'[Email] ),
    "Firstname",
        VAR MinIndex =
            CALCULATE(
                MIN( 'T Index'[Index] )
            )
        RETURN
            CALCULATE(
                MAX( 'T Index'[Firstname ] ),
                'T Index'[Index] = MinIndex
            )
)

生成请求的表

在实际情况下,添加新列的最佳位置是直接在生成输入表的代码中.

In a real case scenario, the best place to add the new column is directly into the code that generates the input table.

这篇关于在DAX(不是powerquery)中,根据列删除重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆