如何使用Alteryx解决Excel文件中的重复列名? [英] How to resolve duplicate column names in excel file with Alteryx?

查看:312
本文介绍了如何使用Alteryx解决Excel文件中的重复列名?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含价格数据的大型excel文件,看起来像这样

 产品| 2015-08-01 | 2015-09-01 | 2015-09-01 | 2015-10-01 
ABC | 13 | 12 | 15 | 14
CDE | 69 | 70 | 71 | 67
FGH | 25 | 25 | 26 | 27

日期2015-09-01可以找到两次,这在上下文中是有效的,但显然弄乱了我的工作流程。
可以理解,第一个值是最小价格,第二个值是最大价格。如果只有一列,则最小值和最大值相同。



是否可以解决此问题?



我的想法如下:
我也有包含 38-42之类的值的单元格,再次指出了最小值和最大值。我通过基于Regex表达式拆分它来解决此问题。可能的解决方案是将两个具有相同标题的列联接在一起,然后根据我的规则拆分值。但这需要我动态检测标题是否重复。



在Alteryx中有可能出现这种情况吗?还是有解决此问题的简便方法?



当然,不幸的是,要求文件的提供者更改它并不是真正的选择。



谢谢 p>

编辑:
刚有另一个想法:
我将表转置为具有格式

 产品|日期|价格低|价格高

因此,如果我可以检查该表中的重复项并以某种方式将这些记录合并为一个,



EDIT2:
既然我还没有弄清楚,我的最终结果应该看起来像EDIT1中的转置表。如果只有一个值,则应将其放入价格低(然后无论如何我都将其复制到价格高。如果有两个值,则应在相应的列中输入它们。)@ Poornima的建议解决了重复的问题比在列名后面加上 _2更复杂的形式,但没有将值放在必填列中。

解决方案

如果此格式适合您:

 产品|日期|价格低|价格高

然后:

-以产品作为关键字段进行转置

-使用选择工具将名称字段截断为10个字符。这将删除Alteryx自动重命名的任何_2值。

-汇总:

按产品分组

按分组名称

然后将最小值和最大值运算应用于值。



结果是:

 产品|名称|最小值|最大值
ABC | 2015-08-0 1 | 13 | 13
ABC | 2015-09-01 | 12 | 15
ABC | 2015-10-01 | 14 | 14


I have a wide excel file with price data, looking like this

Product | 2015-08-01 | 2015-09-01 | 2015-09-01 | 2015-10-01
ABC     | 13         | 12         | 15         | 14
CDE     | 69         | 70         | 71         | 67
FGH     | 25         | 25         | 26         | 27

The date 2015-09-01 can be found twice, which in the context is valid but obviously messes up my workflow. It can be understood that the first value is the minimum price, the second one the maximum price. If there is only one column, min and max are the same.

Is there a way to resolve this issue?

An idea I had was the following: I also have cells that contain a value like "38 - 42", again indicating min and max. I resolved this by spliting it based on a Regex expression. What could be a solution is to join two columns that have the same header, to afterwards split the values according to my rules. That however would require me to detect dynamically if the headers are duplicates.

Is that something that is possible in Alteryx or is there an easier solution for this problem?

And of course asking the supplier of the file to change it is not really an option, unfortunatelly.

Thanks

EDIT: Just got another idea: I transpose the table to have the format

Product | Date | Price Low | Price High

So if I could check for duplicates in that table and somehow merge these records into one, that would do the trick as well.

EDIT2: Since I seem to haven't made that clear, my final result should look like the transposed table in EDIT1. If there is only one value it should go in "Price Low" (and then I will probably copy it to "Price High" anyway. If there are two values they should go in the according columns. @Poornima's suggestion resolves the duplicate issue in a more sophisticated form than putting a "_2" behind the column name, but doesn't put the value in the required column.

解决方案

If this format works for you:

Product | Date | Price Low | Price High

Then:
- Transpose with Product as a key field
- Use a select tool to truncate your Name field to 10 characters. This will remove any _2 values that Alteryx has automatically renamed.
- Summarize:
Group by Product
Group by Name
Then apply Min and Max operations to value.

Result is:

Product  |  Name       |  Min_Value  |  Max_Value  
ABC      |  2015-08-01 |  13         |  13
ABC      |  2015-09-01 |  12         |  15
ABC      |  2015-10-01 |  14         |  14

这篇关于如何使用Alteryx解决Excel文件中的重复列名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆