Win32/C:将行尾转换为 DOS/Windows 格式 [英] Win32/C: Convert line endings to DOS/Windows format

查看:56
本文介绍了Win32/C:将行尾转换为 DOS/Windows 格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Windows API 项目中有以下 C 函数,它读取文件并根据行尾(UNIX、MAC、DOS)将行尾替换为 Windows 的正确行尾(<代码>\r\n):

I've the following C function in a Windows API project that reads a file and based on the line endings (UNIX, MAC, DOS) it replaces the line endings with the right line-endings for Windows (\r\n):

// Standard C header needed for string functions
#include <string.h>

// Defines for line-ending conversion function
#define LESTATUS INT 
#define LE_NO_CHANGES_NEEDED (0)
#define LE_CHANGES_SUCCEEDED (1)
#define LE_CHANGES_FAILED   (-1)

/// <summary>
/// If the line endings in a block of data loaded from a file contain UNIX (\n) or MAC (\r) line endings, this function replaces it with DOS (\r\n) endings.
/// </summary>
/// <param name="inData">An array of bytes of input data.</param>
/// <param name="inLen">The size, in bytes, of inData.</param>
/// <param name="outData">An array of bytes to be populated with output data.  This array must already be allocated</param>
/// <param name="outLen">The maximum number of bytes that can be stored in outData.</param>
/// <param name="bytesWritten">A pointer to an integer that receives the number of bytes written into outData.</param>
/// <returns>
/// If no changes were necessary (the file already contains \r\n line endings), then the return value is LE_NO_CHANGES_NEEDED.<br/>
/// If changes were necessary, and it was possible to store the entire output buffer, the return value is LE_CHANGES_SUCCEEDED.<br/>
/// If changes were necessary but the output buffer was too small, the return value is LE_CHANGES_FAILED.<br/>
/// </returns>
LESTATUS ConvertLineEndings(BYTE* inData, INT inLen, BYTE* outData, INT outLen, INT* bytesWritten)
{
    char *posR = strstr(inData, "\r");
    char *posN = strstr(inData, "\n");
    // Case 1: the file already contains DOS/Windows line endings.
    // So, copy the input array into the output array as-is (if we can)
    // Report an error if the output array is too small to hold the input array; report success otherwise.
    if (posN != NULL && posR != NULL)
    {
        if (outLen >= inLen)
        {
            strcpy(outData, inData);
            return LE_NO_CHANGES_NEEDED;
        }
        return LE_CHANGES_FAILED;
    }
    // Case 2: the file contains UNIX line endings.
    else if (posN != NULL && posR == NULL)
    {
        int i = 0;
        int track = 0;
        for (i = 0; i < inLen; i++)
        {
            if (inData[i] != '\n')
            {
                outData[track] = inData[i];
                track++;
                if (track>outLen) return LE_CHANGES_FAILED;
            }
            else
            {
                outData[track] = '\r';
                track++;
                if (track > outLen) return LE_CHANGES_FAILED;
                outData[track] = '\n';
                track++;
                if (track > outLen) return LE_CHANGES_FAILED;
            }
            *bytesWritten = track;
        }
    }
    // Case 3: the file contains Mac-style line endings.
    else if (posN == NULL && posR != NULL)
    {
        int i = 0;
        int track = 0;
        for (i = 0; i < inLen; i++)
        {
            if (inData[i] != '\r')
            {
                outData[track] = inData[i];
                track++;
                if (track>outLen) return LE_CHANGES_FAILED;
            }
            else
            {
                outData[track] = '\r';
                track++;
                if (track > outLen) return LE_CHANGES_FAILED;
                outData[track] = '\n';
                track++;
                if (track > outLen) return LE_CHANGES_FAILED;
            }
            *bytesWritten = track;
        }
    }
    return LE_CHANGES_SUCCEEDED;
}

但是,我觉得这个函数很长(将近 70 行)并且可以以某种方式减少.我在谷歌上搜索过,但找不到任何有用的东西;C 库或 Windows API 中是否有任何函数允许我执行字符串替换,而不是在 O(n) 时间内逐字节手动搜索字符串?

However, I feel like this function is very long (almost 70 lines) and could be reduced somehow. I've searched on Google but couldn't find anything useful; is there any function in either the C library or the Windows API that will allow me to perform a string-replace rather than manually searching the string byte-by-byte in O(n) time?

推荐答案

每个角色都需要准确地看一次,不多也不少.代码的第一行已经进行了重复比较,因为两个 strstr 调用都从相同的位置开始.你可以使用类似的东西

Every character needs looking at precisely one time, not more and not less. The very first line of your code already makes repeated comparisons, as both strstr calls start at the same position. You could have used something like

char *posR = strstr(inData, "\r");
if (posR && posR[1] == '\n')
   // Case 1: the file already contains DOS/Windows line endings.

如果失败,如果找到 \r 或者如果 posR == NULL,从你结束的地方继续,再次从顶部开始.但是你让 strstr 已经查看"了每个字符直到最后!

and if this fails, continue from where you ended if you did find an \r or, if posR == NULL, starting from the top again. But then you made the strstr already "look at" every character until the end!

另外两个注意事项:

  1. 不需要 strstr 因为您正在寻找单个字符;下次使用strchr
  2. strXXX 函数都假定您的输入是一个格式正确的 C 字符串:它应该以终止 0 结尾.但是,您已经在 inLen 中提供了长度,因此您不必检查零.如果您的输入中在 inLen 字节之前可能有也可能没有 0,您需要采取适当的措施.根据此函数的用途,我假设您根本不需要检查零.
  1. there was no need for strstr because you are looking for a single character; use strchr next time;
  2. the strXXX functions all assume your input is a properly formed C string: it should end with a terminating 0. However, you already provide the length in inLen, so you don't have to check for zeroes. If there may or may not be a 0 in your input before inLen bytes, you need to take appropriate action. Based on the purpose of this function, I'm assuming you don't need to check for zeroes at all.

我的建议:从一开始就查看每个字符一次,并且只在它是either\r时才采取行动或一个\n.如果您遇到的第一个是 \r 下一个是 \n,那么您就完成了.(这假设行尾不是混合"的.)

My proposal: look at every character from the start once, and only take action when it is either an \r or an \n. If the first of these you encounter is an \r and the next one is an \n, you're done. (This assumes the line endings are not "mixed".)

如果您没有在第一个循环中返回,则除了 \r\n 之外还有其他内容,您可以从该点继续.但是你仍然只需要对要么\r \n采取行动!所以我提出这个较短的代码(和一个 enum 而不是你的定义):

If you do not return in this first loop, there is something else than \r\n, and you can continue from that point on. But you still only have to act on either an \r or \n! So I propose this shorter code (and an enum instead of your defines):

enum LEStatus_e { LE_CHANGES_FAILED=-1, LE_NO_CHANGES_NEEDED, LE_CHANGES_SUCCEEDED };

enum LEStatus_e ConvertLineEndings(BYTE *inData, INT inLen, BYTE *outData, INT outLen, INT *bytesWritten)
{
    INT sourceIndex = 0, destIndex;

    if (outLen < inLen)
        return LE_CHANGES_FAILED;

    /*  Find first occurrence of either \r or \n
        This will return immediately for No Change Needed */
    while (sourceIndex < inLen)
    {
        if (inData[sourceIndex] == '\r')
        {
            if (sourceIndex < inLen-1 && inData[sourceIndex+1] == '\n')
            {
                memcpy (outData, inData, inLen);
                *bytesWritten = inLen;
                return LE_NO_CHANGES_NEEDED;
            }
            break;
        }
        if (inData[sourceIndex] == '\n')
            break;
        sourceIndex++;
    }
    /* We processed this far already: */
    memcpy (outData, inData, sourceIndex);
    if (sourceIndex == inLen)
        return LE_NO_CHANGES_NEEDED;
    destIndex = sourceIndex;

    while (sourceIndex < inLen)
    {
        switch (inData[sourceIndex])
        {
            case '\n':
            case '\r':
                sourceIndex++;
                if (destIndex+2 >= outLen)
                    return LE_CHANGES_FAILED;
                outData[destIndex++] = '\r';
                outData[destIndex++] = '\n';
                break;
            default:
                outData[destIndex++] = inData[sourceIndex++];
        }
    }
    *bytesWritten = destIndex;
    return LE_CHANGES_SUCCEEDED;
}

有一些古老而罕见的纯文本"格式使用了其他结构;从内存中,类似于 \r\n\n.如果您希望能够清理任何东西,您可以在单个 \n 之后为所有 \r 添加一个跳过,并且相同对于相反的情况.这也将清除任何混合"行尾,因为它也会正确对待 \r\n.

There are a few old and rare 'plain text' formats that use other constructions; from memory, something like \r\n\n. If you want to be able to sanitize anything, you can add a skip for all \rs after a single \n, and the same for the opposite case. This will also clean up any "mixed" line endings, as it will correctly treat \r\n as well.

这篇关于Win32/C:将行尾转换为 DOS/Windows 格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆