读取CSV文件一些缺少的列 [英] Reading CSV file some missing columns

查看:269
本文介绍了读取CSV文件一些缺少的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用以下代码在CSV文件中读入我的VB.net应用程序:

  (1,$ name)
输入(1,product_name)
输入(1,user_number)
输入,wholesale_cost)
输入(1,dummy)
输入(1,dummy)
结束时

$ b b

我的CSV文件(文字)如下所示:

 客户名称,电话号码,用户名,产品,批发成本,销售价格,毛利润,客户参考
,00000000000,00000000000,产品名称,25.00,35.00,10.00,
,00000000000,00000000000,产品名称,1.00,1.40,0.40,

如您所见,并非所有字段都包含在内,因此在读取文件时会显示错误,因为



如何处理这种类型的文件?




$ b

更新

我已经尝试了 Zenacity 提供的答案,但是当尝试在循环中使用 sArray(1)读取时,它返回 Index在数组的边界之外

解决方案

你应该抓紧的一件事是那些 Filexxxx 方法只是官方和正式的弃用。当使用它们时,Intellisense弹出:


...我的功能使文件I / O操作的生产力和性能更好FileOpen。有关详细信息,请参阅Microsoft.VisualBasic.FileIO.FileSystem。


他们在谈论 My.Computer。 FileSystem 但是有一些甚至更有用的.NET方法。



post不显示数据将如何存储,任何种类和/或结构,那些至少是次优的,如果不是也过时的。这将把它存储在一个类中,以便数字数据可以存储为数字,并且 List 将用于替换数组。



我用一些随机数据做了一个类似于你的快速文件: {CustName,Phone,UserName,Product,Cost ,Profit,SaleDate,RefCode}




  • CustName

  • RefCode存在于30%的时间

  • 添加了一个 来表示数据转换。


$ b 5562,Cat食品,8.26,9.95,1.69,08 / 04/2016,

卡特里娜·卡森,899-8599,,刀刀,4.95,6.68,1.73,10 / 12/2016, 873-W3

,784-4182,,Vapor Compressor,11.02,12.53,1.51,09 / 12/2016,




解析CSV的代码



注意:这是解析CSV的一种糟糕方式。有很多问题可以出现这样做;加上它需要更多的代码。它被呈现,因为它是一个简单的方法,不必处理丢失的字段。请参阅正确的方式

 'form / class level var:
Private SalesItems As List (Of SaleItem)

SaleItem 类存储您关心的元素。 SalesItems 是一个只能存储 SaleItem 对象的集合。该类中的属性允许 十进制形式存储,日期为 DateTime

 'temp var 
Dim item As SaleItem
'create the collection
SalesItems = New List(Of SaleItem)

'加载数据....所有的
Dim data = File.ReadAllLines(C: \\ Temp \custdata.csv)

'解析数据行
'从1开始跳过标题
对于n As Int32 = 0到data.Length - 1
Dim split = data(n).Split(,c)

检查它是否是一个好行
如果split.Length = 9 then
'创建一个新的项目
item = New SaleItem
'store有一些数据
item.CustName = split(0)
item.Phone = split(1)
'dont care anout user name(2)
item.Product = split(3)
'转换数字
item.Price = Convert.ToDecimal(split(4))


item.SaleDate = Convert.ToDecimal(split(5))
'不使用PROFIT,在类(6)中计算

' .ToDateTime(split(7))

'ignore nonexistant RefCode(8)

'向集合中添加新项
'a List根据需要调整大小。
SalesItems.Add(item)
Else
'To Do:记录一个坏行格式
End If
Next

'在DGV中显示以进行批准/调试
dgvMem.DataSource = SalesItems

结果:



注意

一般来说,存储一些可以简单计算的东西。所以 Profit 属性是:

  Public ReadOnly Property Profit As Decimal 
Get
返回(成本 - 价格)
结束Get
结束属性

如果成本或价格更新,它永远不会是陈旧。



如图所示,使用得到的集合可以非常容易地显示给用户。给定 DataSource DataGridView 将创建列并填充行。



正确的方式



String.Split(c)是一个非常糟糕的主意,因为如果产品是:Hose,Small Green 。有很多工具可以为您完成几乎所有的工作:


  1. 阅读文件

  2. 解析线条

  3. 将CSV数据映射到类

  4. 将文本转换为正确的数据类型

  5. 创建一个经济的collecton

除了类之外,上述所有操作都可以使用 CSVHelper

 使用sr作为新的StreamReader(C:\Temp\custdata.csv, False),
csv = New CsvReader(sr)
csv.Configuration.HasHeaderRecord = True

CustData = csv.GetRecords(Of SaleItem)()ToList $ b结束使用

两三行代码读取,解析和创建一个250集合项目。



即使您希望手动进行,CSVHelper也能提供帮助。您不必为您创建列表(Of SaleItem),您可以使用它来读取和解析数据:

  ... like above 
csv.Configuration.HasHeaderRecord = True
Do Until csv.Read()= False
For n As Int32 = 0 To csv.Parser.FieldCount - 1
DoSomethingWith(csv.GetField(n))
下一页
循环

这将逐个返回字段给你。它不会转换任何日期或价格,但它不会扼杀缺失的数据元素。



资源




I am trying to read in a CSV file into my VB.net application using the following code:

While Not EOF(1)
    Input(1, dummy)
    Input(1, phone_number)
    Input(1, username)
    Input(1, product_name)
    Input(1, wholesale_cost)
    Input(1, dummy)
    Input(1, dummy)
End While

My CSV file (as text) looks like this:

Customer Name,Phone Number,Username,Product,Wholesale Cost,Sales Price,Gross Profit, Customer Reference
  ,00000000000,00000000000,Product Name,25.00,35.00,10.00,
  ,00000000000,00000000000,Product Name,1.00,1.40,0.40,

As you can see, not all fields are always included and therefore an error displays when reading the file because it cannot reach the end of the line.

How can I handle this type of file?

Sometimes the fields will be there on some lines, and others not.

UPDATE

I have tried the answer that Zenacity provided but when trying to read using sArray(1) inside the loop it returns Index was outside the bounds of the array

解决方案

One thing that you should come to grips with is that those Filexxxx methods are all but officially and formally deprecated. When using them, Intellisense pops up with:

...The My feature gives you better productivity and performance in file I/O operations than FileOpen. For more information, see Microsoft.VisualBasic.FileIO.FileSystem.

They are talking about My.Computer.FileSystem but there are some even more useful NET methods.

The post doesnt reveal how the data will be stored, but if it is an array of any sort and/or a structure, those are at least suboptimal if not also outdated. This will store it in a class so that the numeric data can be stored as numbers and a List will be used in place of an array.

I made a quick file similar to yours with some random data: {"CustName", "Phone", "UserName", "Product", "Cost", "Price", "Profit", "SaleDate", "RefCode"}:

  • The CustName is present 70% of the time
  • The username is never present
  • The RefCode is present 30% of the time
  • I added a SaleDate to illustrate that data conversion

Ziggy Aurantium,132-5562,,Cat Food,8.26,9.95,1.69,08/04/2016,
Catrina Caison,899-8599,,Knife Sharpener,4.95,6.68,1.73,10/12/2016,X-873-W3
,784-4182,,Vapor Compressor,11.02,12.53,1.51,09/12/2016,

Code to Parse the CSV

Note: this is a bad way to parse a CSV. There are lots of problems that can arise doing it this way; plus it takes more code. It is presented because it is a simple way to not have to deal with the missing fields. See The Right Way

' form/class level var:
Private SalesItems As List(Of SaleItem)

SaleItem is a simple class to store the elements you care about. SalesItems is a collection which can store only SaleItem objects. The properties in that class allow Price and Cost to be stored as Decimal and the date as a DateTime.

' temp var
Dim item As SaleItem
' create the collection
SalesItems = New List(Of SaleItem)

' load the data....all of it
Dim data = File.ReadAllLines("C:\Temp\custdata.csv")

' parse data lines 
' Start at 1 to skip a Header
For n As Int32 = 0 To data.Length - 1
    Dim split = data(n).Split(","c)

    ' check if it is a good line
    If split.Length = 9 Then
        ' create a new item
        item = New SaleItem
        ' store SOME data to it
        item.CustName = split(0)
        item.Phone = split(1)
        ' dont care anout user name (2)
        item.Product = split(3)
        ' convert numbers
        item.Price = Convert.ToDecimal(split(4))
        item.Cost = Convert.ToDecimal(split(5))
        ' dont use the PROFIT, calculate it in the class (6)

        ' convert date
        item.SaleDate = Convert.ToDateTime(split(7))

        ' ignore nonexistant RefCode (8)

        ' add new item to collection
        ' a List sizes itself as needed!
        SalesItems.Add(item)
    Else
        ' To Do: make note of a bad line format
    End If
Next

' show in DGV for approval/debugging
dgvMem.DataSource = SalesItems

Result:

Notes
It is generally a bad idea to store something which can be simply calculated. So the Profit property is:

Public ReadOnly Property Profit As Decimal
    Get
        Return (Cost - Price)
    End Get
End Property

It can never be "stale" if the cost or price is updated.

As shown, using the resulting collection can be displayed to the user very easily. Given a DataSource, the DataGridView will create the columns and populate the rows.

The Right Way

String.Split(c) is a very bad idea because if the product is: "Hose, Small Green" it will chop that up and treat it as 2 fields. There are a number of tools which will do nearly all the work for you:

  1. Read the file
  2. Parse the lines
  3. Map the CSV data to a class
  4. convert the text into the proper data type
  5. create an economical collecton

Aside from the class, all the above could be done in just a few lines using CSVHelper:

Private CustData As List(Of SaleItem)
...
Using sr As New StreamReader("C:\Temp\custdata.csv", False),
     csv = New CsvReader(sr)
    csv.Configuration.HasHeaderRecord = True

    CustData = csv.GetRecords(Of SaleItem)().ToList()
End Using

Two or three lines of code to read, parse, and create a collection of 250 items.

Even if you want to do it manually for some reason, CSVHelper can help. Rather than create a List(Of SaleItem) for you, you can use it to read and parse the data:

... like above
csv.Configuration.HasHeaderRecord = True
Do Until csv.Read() = False
    For n As Int32 = 0 To csv.Parser.FieldCount - 1
        DoSomethingWith(csv.GetField(n))
    Next
Loop

This will return the fields to you one by one. It wont convert any dates or prices, but it wont choke on missing data elements either.

Resources

这篇关于读取CSV文件一些缺少的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆