Fast HexStringToByteArray and ByteArrayToHexString conversion

No Comments June 1, 2011

Recently I needed to efficiently convert hex strings to byte arrays. .NET Framework provides BitConverter.ToString() for byte array to hex, but there’s no hex to array equivalent. BitConverter itself is not efficient at handling arrays to hex strings. For symmetry I implemented both hex to array and array to hex methods with the following requirements:

  • Optimized for memory usage, no rediculously large lookup tables.
  • No intermediate garbage generation such as temp strings, etc.
  • Minimal use of IF statements to maximize efficiency.
  • Ability to handle lower and upper case letters with no loss in speed.
  • Error handling, invalid characters must be detected.

Most implementations I stumbled upon use way too many IF statements for range checking, additional IFs for error checking or large lookup tables.

ByteArrayToHexString is relatively straight forward and uses HexAlphabet array to lookup ASCII lowercase letters and digits.

private static byte[] HexAlphabet = new byte[16] {
    0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37,
    0x38, 0x39, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66
};

public static string ByteArrayToHexString(byte[] value)
{
    if (value == null)
    {
        throw new ArgumentNullException("value");
    }

    int length = value.Length;

    if (length == 0)
    {
        return string.Empty;
    }

    char[] output = new char[length << 1];

    for (int i = 0; i < value.Length; i++)
    {
       output[i << 1] = (char)HexAlphabet[value[i] >> 4];
       output[(i << 1) + 1] = (char)HexAlphabet[value[i] & 0x0F];
    }

    return new string(output);
}

HexStringToByteArray is challenging to implement efficiently due to the hex alphabet “holes” in the ASCII table. Using HexMap table allows the fastest and simplest handling of mixed case hex strings and invalid characters. The table is stored in a byte[] instead of char[] to save space and make it more cache friendly since each Unicode character in .NET and Win32 occupies 2 bytes.

The lookup table itself need not be bigger than the lowercase letter “f”. That’s a 103 character table occupying only 103 bytes of memory. Further, all the characters in the table that are not part of the hex alphabet can be replaced by this same number 103 or 0×67 in hex. The advantage is we can use the same number to check for the table index out of bounds and if the index is valid, check the indexed character if it’s also valid.

private static readonly byte[] EmptyByteArray = new byte[0];

private static byte[] HexMap = new byte[0x67] {
    0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67,
    0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67,
    0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67,
    0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67,
    0x67, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67,
    0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67, 0x67,
    0x67, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
};

public static byte[] HexStringToByteArray(string value)
{
    if (value == null)
    {
        throw new ArgumentNullException("value");
    }

    int length = value.Length;

    if (length == 0)
    {
        return EmptyByteArray;
    }

    if ((length & 1) != 0)
    {
        throw new ArgumentException("Hex string must contain an even number of characters.", "value");
    }

    byte[] output = new byte[length >> 1];

    for (int i = 0; i < value.Length; i += 2)
    {
        uint c1 = value[i];
        uint c2 = value[i + 1];

        if ((c1 >= 0x67) || ((c1 = HexMap[c1]) >= 0x67) || (c2 >= 0x67) || ((c2 = HexMap[c2]) >= 0x67))
        {
            throw new ArgumentException("Hex string contains unrecognized character.", "value");
        }

        output[i >> 1] = (byte)((c1 << 4) | c2);
    }

    return output;
}

Here it is, the two IF statements per character efficient hex string to byte array converter. Note the usage of 0×67 (103).

Happy converting ;-}


No Comments