发票异常检测 [英] Invoice Anomaly Detection

查看:67
本文介绍了发票异常检测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,

我正在一个项目中,我有一个充满发票的数据库,我想用它来教机器学习模型来标记即将到来的新欺诈性发票.几周来,最好的方法是异常检测 这就是我尝试过的.数据仅包含常规点,也没有欺诈数据的例子.

到目前为止,我一直在尝试遵循Azure ML Studio Gallery中的异常检测信用风险"示例.我用于测试的输入数据仅包含2列,希望对算法有所帮助.列是供应商编号/名称和数量 发票.我将所有数据都标记为非欺诈性数据,并将其80%馈给模型进行训练,同时我添加了几个欺诈性行(数量惊人,数量大,负数或供应商编号未遵循所有格式) 其他),再将其用于其他20%并用于测试.

尽管其中有一些非常明显的异常,至少对于人来说,无论我如何修改参数(我也使用Tune Model Hyper-parameters),模型预测的确定性几乎总是在49附近到51%.

出于好奇,我也尝试使用某些两类分类算法或聚类,但结果甚至更糟.

所以基本上我想问一下我做错了什么吗?是否有一种方法可以使用我没有尝试过的ML在如此多样化的数据集中找到异常(有很多不同的供应商,他们的发票金额有不同的模式)?

我愿意接受建议,我们将不胜感激

PS 请注意,由于供应商数量众多,因此为每个单独的供应商训练模型虽然会产生更好的结果(尽管也不是确定的结果),但这是不可行的.

解决方案

对于给您带来的不便,我们深表歉意.目前,异常检测"处于预览版,我将把您的问题发送给产品组,以了解我们如何为您提供帮助.

此致

雨桐


Hi all,

I am working on a project where I have a database full of invoices and I'd like to use that to teach a Machine Learning model to flag new fraudulent invoices coming in. From what I've read in the past couple of weeks the best approach is Anomaly Detection and that is what I have tried. The data contains only regular points aka there are no examples of fraudulent data.

What I've attempted so far is to follow the Anomaly Detection Credit Risk example from Azure ML Studio Gallery. The input data I used for testing only contained 2 columns in the hopes of aiding the algorithm. The columns were Vendor Number/Name and the amount of the invoice. I labelled all the data as non-fraudulent, and fed 80% of it to the model for training, while I added several fraudulent rows (either with absurdly large amounts, negative amounts or Vendor Numbers that did not follow the format of all the others) to the other 20% and used that for testing.

Despite some of these being very obvious anomalies, to a human eye at least, no matter how I modified parameters (I was using Tune Model Hyper-parameters as well) the degree of certainty for the models prediction was almost always around 49 to 51%. 

Out of curiosity I also attempted using some of the two-class classification algorithms or clustering but the results were even worse. 

So basically I wanted to ask if I am doing something wrong? Is there a way of finding an anomaly in such a diverse dataset (there are a lot of different vendors with different patterns for their invoice amounts) using ML that I have not tried?

I am open to suggestions and any help is greatly appreciated

P.S. Note that training a model for each individual vendor, while yielding better results (though also not definitive ones), is not feasible due to the large amount of vendors.

解决方案

Hi,

Sorry for all inconveniences, for now Anomaly Detection is in preview version, I will send your issues to product group to see how we can help you.

Regards,

Yutong


这篇关于发票异常检测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆