我有55,000行的租赁数据需要清理,数据不一致,并且包含文本和数字 [英] I have 55,000 rows of rental data to clean, the data is inconsistent and contains text and numbers

查看:57
本文介绍了我有55,000行的租赁数据需要清理,数据不一致,并且包含文本和数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有55,000行租金数据,其中一列是每周的租金价格.

I have 55,000 rows of rental data, in one of the columns is the rent prices per week.

此列未经过清理,导致这些字段中包含的数字,文本和电话号码构成噩梦.

This column is not sanitized which has led to a nightmarish combination of numbers, text and phone numbers contained within these fields.

我正在尝试仅提取每周租金数字,或者如果没有提供,则返回一个空白单元格.

I am trying to extract just the weekly rental figure or return a blank cell if none is provided.

下面使用的当前代码,它将列复制到另一列并运行一系列查找并替换此数据.这将导致大约75%/80%的数据是完美的,其余的数据几乎完全无用(数据如下图所示被屠杀,或者包含诸如$ 400 $ 500的项目仍需要清理)

Code current employed below, which copies the column to another column and runs a series of find and replaces against this data. This is leading to about 75%/80% being perfect and the remainder being almost entirely useless (data is either butchered like the below image or contains items like $400$500 which still needs to be cleaned)

以上数据包含以下文本,作为其屠宰方式的示例:

The above data contained the following text as an example of how its butchered:

Sub A_Core_Clean_Level_1()

'Application.ScreenUpdating = False
Rows("1:1").AutoFilter

Sheets("Rental Data").Range("I1").EntireColumn.Copy Range("R1").EntireColumn

Columns("R:R").Replace What:="1oo", Replacement:="100", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="2oo", Replacement:="200", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="3oo", Replacement:="300", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="4oo", Replacement:="400", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="5oo", Replacement:="500", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="6oo", Replacement:="600", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="7oo", Replacement:="700", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="9oo", Replacement:="800", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="9oo", Replacement:="900", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False

Columns("R:R").Replace What:="1 day", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="2 days", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="3 days", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="4 days", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="5 days", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="6 days", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="7 days", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="8 days", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="9 days", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="10 days", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False

Columns("R:R").Replace What:=".00*", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="-", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="$ ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" $", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False

Columns("R:R").Replace What:=" 1 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 2 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 3 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 4 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 5 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 6 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 7 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 8 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 9 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 10 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 11 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 12 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 13 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 14 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 15 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 16 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 17 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 18 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 19 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 20 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 21 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 22 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 23 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 24 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" 25 ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False

Columns("R:R").Replace What:="a", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="b", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="c", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="d", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="e", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="f", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="g", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="h", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="i", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="j", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="k", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="l", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="m", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="n", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="o", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="p", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="q", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="r", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="s", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="t", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="u", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="v", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="w", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="x", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="y", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="z", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False

Columns("R:R").Replace What:="..", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=".", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False

Columns("R:R").Replace What:=" ", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="&", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="+", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="/", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="!", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="|", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:="(", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=")", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
'Columns("R:R").Replace What:="'", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
'Columns("R:R").Replace What:=",", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Columns("R:R").Replace What:=" *", Replacement:="", LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False

Rows("1:1").AutoFilter

Application.ScreenUpdating = True

End Sub

我相信理想的结果将是提取找到3个或4个连续数字的情况,然后使用代码查找和替换来清理其余部分.虽然这意味着存在手机/电话号码;如果它们是第一个,则它们仍将包含在捕获的数据中,但是更接近清除.

I believe the ideal outcome would be to extract circumstances where 3 or 4 consecutive numbers are found then work on code to clean the remainder using find and replace(s). While this would mean that where mobile/phone numbers are present; if they are first, they would still be included in the data captured, however it would be closer to cleaned.

有没有人建议使用公式或代码在excel中清除此数据.

Does anyone have any suggestions to clean this data within excel using either formula or code.

推荐答案

这是用户定义函数(UDF)的工作.这是一个原型.

This is a job for a User Defined Function (UDF). Here is a prototype.

Function Rental(Cell As Range) As Variant

    Dim CellVal As String           ' Cell's contents
    Dim p       As Integer          ' position of $ in CellVal
    
    CellVal = Cell.Value
    p = InStr(CellVal, "$") + 1
    If p = 1 Then
        Rental = "No $$$"
    Else
        Rental = Val(Mid(CellVal, p))
    End If
End Function

将该功能安装在工作簿的标准代码模块中.这是您必须插入自己的模块,默认情况下为 Module1 .现有的代码表都无法完成这项工作.

Install the function in a standard code module of your workbook. That's a module you have to insert yourself, by default Module1. None of the existing code sheets can do the job.

使用如下所示的语法从工作表中调用该函数,其中A2是您的宰杀数据所在的单元格.您可以使用与内置Excel公式相同的方式向下复制公式.

Call the function from the worksheet with syntax as shown below, where A2 is the cell in which you have your butchered data. You can copy the formula down in the same way as built-in Excel formulas.

=Rental(A2)

在当前形状下,该函数将查找$符号并返回其后的任何数字,直到下一个非数字字符或字符串的末尾.系统的小数点分隔符将被视为数字,而不是千位分隔符.

In its present shape the function will look for the $ sign and returns whatever numbers follow it, up till the next non-numeric character or the end of the string. Your system's decimal separator will be deemed numeric but not the thousands separator.

如果未找到$符号,则该函数返回"No $$$".UDF的优点就体现在这里.相反,您可以使用另一种寻找所需值的方法.

If no $-sign is found the function returns "No $$$". Here the advantage of a UDF comes in. Instead of this you might hitch on another method of looking for the value you want.

这篇关于我有55,000行的租赁数据需要清理,数据不一致,并且包含文本和数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆