文本文件为JSON,结果中不包含最后几行 [英] Text file to JSON, last few lines are not included in the result

查看:97
本文介绍了文本文件为JSON,结果中不包含最后几行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在读取文本文件并将其在我的react项目中使用regex将其转换为JSON格式.它工作正常,但不包括文本文件的最后20-30行.将其转换为JSON时存在一些问题,但我无法理解该问题.

I'm reading a text file and converting it to JSON format using regex in my react project.It is working fine but not including last 20-30 lines of the text file. There is some problem while converting it to JSON but I am unable to understand the problem.

这是我的代码:

    readTextFile = file => {
        let rawFile = new XMLHttpRequest();
        rawFile.open("GET", file, false);
        rawFile.onreadystatechange = () => {
            if (rawFile.readyState === 4) {
                if (rawFile.status === 200 || rawFile.status === 0) {
                    let allText = rawFile.responseText;
                    // console.log(allText)

                    let reg = /\d\d\d\d-(0?[1-9]|1[0-2])-(0?[1-9]|[12][0-9]|3[01]) (00|[0-9]|1[0-9]|2[0-3]):([0-9]|[0-5][0-9]):([0-9]|[0-5][0-9])/g;

                    let arr = [];
                    let start = null;
                    let line, lastSpacePos;
                    let match;
                    while ((match = reg.exec(allText)) != null) {
                        if(start) {
                            line = allText.slice(start, match.index).trim();
                            lastSpacePos = line.lastIndexOf(' ');
                            arr.push({
                                date: line.slice(0, 19),
                                text: line.slice(20, lastSpacePos).trim(),
                                user_id: line.slice(lastSpacePos).trim()
                            });
                        }

                        start = match.index
                    }
                    console.log(arr);

                    this.setState({
                        // text: JSON.stringify(arr)
                        text: allText
                    });
                }
            }
        };

推荐答案

不确定对Question中现有代码的问题.

Am not certain about the issue with the existing code at Question.

要使用替代方法获得问题"中所述的预期结果,可以使用RegExp /\s{2,}|\n+/g替换大于2的空格字符和换行符; /[\d-]+\s[\d:]+/g获取日期; /.+(?=\s\w+\s$|\s\w+$)|\w+\s$|\w+$/g匹配后跟空格,单词字符和空格字符或字符串结尾的文本,以及匹配空格字符之前的字符,空格字符,空格字符或字符串结尾的字符;从.map()

To get expected result described at Question utilizing an alternative approach you can use RegExp /\s{2,}|\n+/g to replace space characters greater than 2 and new line characters; /[\d-]+\s[\d:]+/g to get dates; /.+(?=\s\w+\s$|\s\w+$)|\w+\s$|\w+$/g to match text that is followed by space, word characters and space character or end of string and characters before space characters followed by word characters and space character or end of string; return an object with a property set for each element of the array from .map()

let allText = `2014-06-01 23:07:58 President Resigns in Georgia’s Breakaway Region of 
Abkhazia t.co/DAploRvCvV                                                    nytimes 
2014-06-01 23:48:06 The NYT FlipBoard guide to understanding climate 
change and its consequences t.co/uPGTuYiSmQ                                 nytimes 
2014-06-01 23:59:06 For all the struggles that young college grads 
face, a four-year degree has probably never been more valuable 
t.co/Gjf6wrwMsS         nytimes 
2014-06-01 23:35:09 It's better to be a community-college graduate than 
a college dropout t.co/k3CO7ClmIG                                           nytimes 
2014-06-01 22:47:04 Share your experience with Veterans Affairs health 
care t.co/PrDhLC20Bt                                                        nytimes 
2014-06-01 22:03:27 Abandon Hope, Almost All Ye Who Enter the N.B.A. 
Playoffs t.co/IQAJ5XNddR                                                    nytimes`;

// replace more than one consecutive space character and new line characters
allText = allText.replace(/\s{2,}|\n+/g, " ");
// get dates
let dates = allText.match(/[\d-]+\s[\d:]+/g);
// get characters that are not dates
// spread `dates` to resulting array
// return object
let res = allText
.split(/[\d-]+\s[\d:]+\s/)
.filter(Boolean)
.map((text, index) => 
  [dates[index], ...text.match(/.+(?=\s\w+\s$|\s\w+$)|\w+\s$|\w+$/g)])
.map(([date, text, user_id]) => ({date, text, user_id}));

console.log(res);

这篇关于文本文件为JSON,结果中不包含最后几行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆