逐行读取CSV文件并进行解析 [英] Reading CSV file line by line and parsing it

查看:632
本文介绍了逐行读取CSV文件并进行解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个CSV文件,需要在 Scanner 的帮助下逐行读取,并将国家名称仅存储在字符串数组中。这是我的CSV文件:

I have a CSV file that I need to read line by line with the help of a Scanner and store only country names into an array of strings. Here is my CSV file:

World Development Indicators
Number of countries,4
Country Name,2005,2006,2007
Bangladesh,6.28776238,13.20573922,23.46762823
"Bahamas,The",69.21279415,75.37855087,109.340767
Brazil,46.31418452,53.11025849,63.67475185
Germany,94.55486999,102.2828888,115.1403608

这是我到目前为止的内容:

This is what I have so far:

public String[] getCountryNames() throws IOException, FileNotFoundException{
    String[] countryNames = new String[3];
    int index = 0;
    BufferedReader br = new BufferedReader(new FileReader(fileName));
    br.readLine();
    br.readLine();
    br.readLine();
    String line = br.readLine();
    while((br.readLine() != null) && !line.isEmpty()){
        String[] countries = line.split(",");
        countryNames[index] = countries[0];
        index++;
        line = br.readLine();
    }
    System.out.println(Arrays.toString(countryNames));
    return countryNames;
}

输出:

[Bangladesh, Brazil, null]

出于某种原因跳过巴哈马群岛,看不懂德国。请帮助我,我已经被这种方法困扰了几个小时。感谢您的时间和精力。返回应该是一个字符串数组(国家名称)。

For some reason it skips "Bahamas, The" and can't read Germany. Please help me, I have been stuck on this method for hours already. Thanks for your time and effort. The return should be an array of Strings (country names).

推荐答案

解析此CSV文件的代码有两个问题。正如一些人指出的那样,您正在呼叫您的阅读器上的 readLine 次数过多,并丢弃了输出。每次从流中读取数据时,都将失去对当前读取点之前的任何数据的访问权限。因此,例如, reader.readLine()!= null 会从流中读取新数据,检查其是否为null,然后立即将其删除。尚未将其存储在变量中。这是您在阅读时丢失数据的主要原因。

There are two issues with your code for parsing this CSV file. As a few folks have pointed out, you're calling readLine on your reader too many times, and discarding the output. Each time you read from the stream, you lose access to any data before the current read point. So reader.readLine() != null, for example, reads new data from the stream, checks that it isn't null, and then immediately gets rid of it since you haven't stored it in a variable. That's the main reason you're losing data while reading.

第二个问题是您的分居状况。您要分割逗号,因为这是CSV文件,这很有意义,但您的数据也包含逗号(例如,巴哈马群岛)。您需要更具体的拆分条件,如此帖子中所述。

The second issue is your split condition. You're splitting on commas, which makes sense since this is a CSV file, but your data contains commas too (for example, "Bahamas, The"). You'll need a more specific split condition, as described in this post.

下面是一个示例,它看起来像(使用 countryNames 的列表而不是数组,因为使用起来更容易了):

Here's an example of what this might look like (using a list for the countryNames instead of an array, because that's much easier to work with):

private static final String csv = "World Development Indicators\n"
    + "Number of countries,4\n"
    + "Country Name,2005,2006,2007\n"
    + "Bangladesh,6.28776238,13.20573922,23.46762823\n"
    + "\"Bahamas,The\",69.21279415,75.37855087,109.340767\n"
    + "Brazil,46.31418452,53.11025849,63.67475185\n"
    + "Germany,94.55486999,102.2828888,115.1403608\n";

public static String[] getCountryNames() throws Exception {
    List<String> countryNames = new ArrayList<>();

    //BufferedReader br = new BufferedReader(new FileReader(fileName));
    BufferedReader br = new BufferedReader(new StringReader(csv));
    br.readLine();
    br.readLine();
    br.readLine();

    String line = br.readLine();
    while (line != null && !line.isEmpty()) {
        String[] countries = line.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)", -1);
        countryNames.add(countries[0]);
        line = br.readLine();
    }

    System.out.println(countryNames);
    return countryNames.toArray(new String[0]);
}

这篇关于逐行读取CSV文件并进行解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆