Escape Sequences in C#

“Escape sequences huh… everyone knows that: \n, \r, \t, \\ to name a few.” Well, if you truly think you know all about escape sequences, here are a few challenges for you.

Challenge 1: List all the character escape sequences

This may not be as easy as you think. If you can come up with the following list without consulting the language spec, congratulations!

Solution:

Escape Sequence Escaped Character
\’ single quote
\” double quote
\\ backslash
\a alert
\b backspace
\f form feed
\n new line
\r carriage return
\t horizontal tab
\v vertical tab
Unicode character 0 (zero), or null character
\uNNNN Unicode character
\UNNNNNNNN Unicode character (for generating surrogates)
\xN Unicode character (variable length version of \uNNNN)
\xNN
\xNNN
\xNNNN

A few things to note:

  • Each N represents a valid hex digit, i.e. [0-9a-fA-F]
  • \x can consume up to 4 characters. For example, if you want to print the Unicode \xFF followed by two “F” characters, do NOT write as "\xFFFF", which means the Unicode character \xFFFF; instead, write "\x00FFFF" or "\xFF" + "FF"
  • \UNNNNNNNN is used for generating a pair of surrogates (a high and a low surrogate). Since .NET only supports high surrogate ranging from U+D800 to U+DBFF, and low surrogate ranging from U+DC00 to U+DFFF, the maximum you can specify using \U is \U0010FFFF. Any number larger than that results in a compile error.
  • \UNNNNNNNN occupies two characters, so the string length is one character longer than you’d expect. Of course, you cannot assign it to a char.
  • Unlike C/C++ or Java, C# does NOT support octal escapes.

Challenge 2: Convert an escaped character to its escape sequence

When you write Console.WriteLine("Title:\r\n\tHello World!"), it prints out:

image

The challenge is to write code so that the program outputs "Title:\r\n\tHello World!" in the console at runtime, i.e.

image

Solution:

The most obvious way is to use string.Replace(), e.g. string.Replace("\r", @"\r").

public static string Escape(string input)
{
    string result = input;

    result = result.Replace("\\", @"\\");    // This needs to be done first!
    result = result.Replace("\"", @"\""");
    result = result.Replace("\a", @"\a");
    result = result.Replace("\b", @"\b");
    result = result.Replace("\f", @"\f");
    result = result.Replace("\n", @"\n");
    result = result.Replace("\r", @"\r");
    result = result.Replace("\t", @"\t");
    result = result.Replace("\v", @"\v");
    result = result.Replace("\0", @"\0");

    return result;
}

Or, if you prefer regular expression, here is another way with slightly more code:

public static class StringExtensions
{
    private static Dictionary _escapeMapping = new Dictionary()
    {
        {"\\\\", @"\\"},
        {"\"", @"\"""},
        {"\a", @"\a"},
        {"\b", @"\b"},
        {"\f", @"\f"},
        {"\n", @"\n"},
        {"\r", @"\r"},
        {"\t", @"\t"},
        {"\v", @"\v"},
        {"\0", @"\0"},
    };

    private static Regex escapeRegex = new Regex(string.Join("|", _escapeMapping.Keys.ToArray()));

    public static string Escape(this string input)
    {
        return escapeRegex.Replace(input, EscapeMatchEval);
    }

    private static string EscapeMatchEval(Match match)
    {
        if (_escapeMapping.ContainsKey(match.Value))
        {
            return _escapeMapping[match.Value];
        }
        return _escapeMapping[Regex.Escape(match.Value)];
    }
}

There is also the third way: by using CodeDom and letting .NET handle those replacements for us.

public static string Escape(string input)
{
    using (var writer = new StringWriter())
    {
        using (var provider = new Microsoft.CSharp.CSharpCodeProvider())
        {
            provider.GenerateCodeFromExpression(new System.CodeDom.CodePrimitiveExpression(input), writer, null);
        }

        return writer.ToString();
    }
}

Although the third way seems pretty cool, it has a few catches. Since CodeDom is really meant for generating code, its engine does a few optimizations; for example, breaking up a long string into multiple shorter ones that are joined by the string concatenation operator, i.e. + in C#. Also, if you try to use Microsoft.VisualBasic.VBCodeProvider instead of the CSharp one, you will end up with outputting Global.Microsoft.VisualBasic.ChrW(13) for \r for example. Finally, due to the limitation in CodeDom, the third way cannot deal with the escape sequences: \a, \b, \f, and \v.

At this point, you may notice that none of the three methods deals with Unicode. But you can easily extend them or create a separate one to deal with escaping Unicode characters. See below as an example:

public static string EscapeUnicode(string input)
{
    var builder = new StringBuilder();
    for (int i = 0; i < input.Length; i++)
    {
        if (char.IsSurrogatePair(input, i))
        {
            builder.Append("\\U" + char.ConvertToUtf32(input, i).ToString("X8"));
            i++;  //skip the next char     
        }
        else
        {
            int charVal = char.ConvertToUtf32(input, i);
            if (charVal > 127)
            {
                builder.Append("\\u" + charVal.ToString("X4"));
            }
            else
            {
                //an ASCII character 
                builder.Append(input[i]);
            }
        }
    }

    return builder.ToString();
}

Challenge 3: Convert an escape sequence to the escaped character

It’s the reverse of the challenge 2, and it’s a lot more interesting! …well, only so if you can come up with more than one way.

Let’s say if you are reading a file that contains escape sequences, when the content is read into a string, escape sequences are treated as individual characters and are escaped in memory, e.g. \n in file becomes \\n in memory. Now what you need to do is to convert \\n back to \n in memory at runtime to represent a new line character.

Solution:

You may still resort to string.Replace() and regular expressions (needed to handle Unicode), but there is a much better and more elegant way doing it.

public static string ParseString(string input)
{
    var provider = new Microsoft.CSharp.CSharpCodeProvider();
    var parameters = new System.CodeDom.Compiler.CompilerParameters()
    {
        GenerateExecutable = false,
        GenerateInMemory = true,
    };

    var code = @"
        namespace Tmp
        {
            public class TmpClass
            {
                public static string GetValue()
                {
                    return """ + input + @""";
                }
            }
        }";

    var compileResult = provider.CompileAssemblyFromSource(parameters, code);

    if (compileResult.Errors.HasErrors)
    {
        throw new ArgumentException(compileResult.Errors.Cast<System.CodeDom.Compiler.CompilerError>().First(e => !e.IsWarning).ErrorText);
    }

    var asmb = compileResult.CompiledAssembly;
    var method = asmb.GetType("Tmp.TmpClass").GetMethod("GetValue");

    return method.Invoke(null, null) as string;
}
About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: