Escape Sequences in C#
19/10/2010 Leave a comment
“Escape sequences huh… everyone knows that: \n, \r, \t, \\ to name a few.” Well, if you truly think you know all about escape sequences, here are a few challenges for you.
Challenge 1: List all the character escape sequences
This may not be as easy as you think. If you can come up with the following list without consulting the language spec, congratulations!
Solution:
| Escape Sequence | Escaped Character |
\’ |
single quote |
\” |
double quote |
\\ |
backslash |
\a |
alert |
\b |
backspace |
\f |
form feed |
\n |
new line |
\r |
carriage return |
\t |
horizontal tab |
\v |
vertical tab |
| Unicode character 0 (zero), or null character | |
\uNNNN |
Unicode character |
\UNNNNNNNN |
Unicode character (for generating surrogates) |
\xN |
Unicode character (variable length version of \uNNNN) |
\xNN |
|
\xNNN |
|
\xNNNN |
A few things to note:
- Each
Nrepresents a valid hex digit, i.e.[0-9a-fA-F] \xcan consume up to 4 characters. For example, if you want to print the Unicode\xFFfollowed by two “F” characters, do NOT write as"\xFFFF", which means the Unicode character\xFFFF; instead, write"\x00FFFF"or"\xFF" + "FF"\UNNNNNNNNis used for generating a pair of surrogates (a high and a low surrogate). Since .NET only supports high surrogate ranging from U+D800 to U+DBFF, and low surrogate ranging from U+DC00 to U+DFFF, the maximum you can specify using\Uis\U0010FFFF. Any number larger than that results in a compile error.\UNNNNNNNNoccupies two characters, so the string length is one character longer than you’d expect. Of course, you cannot assign it to achar.- Unlike C/C++ or Java, C# does NOT support octal escapes.
Challenge 2: Convert an escaped character to its escape sequence
When you write Console.WriteLine("Title:\r\n\tHello World!"), it prints out:
The challenge is to write code so that the program outputs "Title:\r\n\tHello World!" in the console at runtime, i.e.
Solution:
The most obvious way is to use string.Replace(), e.g. string.Replace("\r", @"\r").
public static string Escape(string input)
{
string result = input;
result = result.Replace("\\", @"\\"); // This needs to be done first!
result = result.Replace("\"", @"\""");
result = result.Replace("\a", @"\a");
result = result.Replace("\b", @"\b");
result = result.Replace("\f", @"\f");
result = result.Replace("\n", @"\n");
result = result.Replace("\r", @"\r");
result = result.Replace("\t", @"\t");
result = result.Replace("\v", @"\v");
result = result.Replace("\0", @"\0");
return result;
}
Or, if you prefer regular expression, here is another way with slightly more code:
public static class StringExtensions
{
private static Dictionary _escapeMapping = new Dictionary()
{
{"\\\\", @"\\"},
{"\"", @"\"""},
{"\a", @"\a"},
{"\b", @"\b"},
{"\f", @"\f"},
{"\n", @"\n"},
{"\r", @"\r"},
{"\t", @"\t"},
{"\v", @"\v"},
{"\0", @"\0"},
};
private static Regex escapeRegex = new Regex(string.Join("|", _escapeMapping.Keys.ToArray()));
public static string Escape(this string input)
{
return escapeRegex.Replace(input, EscapeMatchEval);
}
private static string EscapeMatchEval(Match match)
{
if (_escapeMapping.ContainsKey(match.Value))
{
return _escapeMapping[match.Value];
}
return _escapeMapping[Regex.Escape(match.Value)];
}
}
There is also the third way: by using CodeDom and letting .NET handle those replacements for us.
public static string Escape(string input)
{
using (var writer = new StringWriter())
{
using (var provider = new Microsoft.CSharp.CSharpCodeProvider())
{
provider.GenerateCodeFromExpression(new System.CodeDom.CodePrimitiveExpression(input), writer, null);
}
return writer.ToString();
}
}
Although the third way seems pretty cool, it has a few catches. Since CodeDom is really meant for generating code, its engine does a few optimizations; for example, breaking up a long string into multiple shorter ones that are joined by the string concatenation operator, i.e. + in C#. Also, if you try to use Microsoft.VisualBasic.VBCodeProvider instead of the CSharp one, you will end up with outputting Global.Microsoft.VisualBasic.ChrW(13) for \r for example. Finally, due to the limitation in CodeDom, the third way cannot deal with the escape sequences: \a, \b, \f, and \v.
At this point, you may notice that none of the three methods deals with Unicode. But you can easily extend them or create a separate one to deal with escaping Unicode characters. See below as an example:
public static string EscapeUnicode(string input)
{
var builder = new StringBuilder();
for (int i = 0; i < input.Length; i++)
{
if (char.IsSurrogatePair(input, i))
{
builder.Append("\\U" + char.ConvertToUtf32(input, i).ToString("X8"));
i++; //skip the next char
}
else
{
int charVal = char.ConvertToUtf32(input, i);
if (charVal > 127)
{
builder.Append("\\u" + charVal.ToString("X4"));
}
else
{
//an ASCII character
builder.Append(input[i]);
}
}
}
return builder.ToString();
}
Challenge 3: Convert an escape sequence to the escaped character
It’s the reverse of the challenge 2, and it’s a lot more interesting! …well, only so if you can come up with more than one way.
Let’s say if you are reading a file that contains escape sequences, when the content is read into a string, escape sequences are treated as individual characters and are escaped in memory, e.g. \n in file becomes \\n in memory. Now what you need to do is to convert \\n back to \n in memory at runtime to represent a new line character.
Solution:
You may still resort to string.Replace() and regular expressions (needed to handle Unicode), but there is a much better and more elegant way doing it.
public static string ParseString(string input)
{
var provider = new Microsoft.CSharp.CSharpCodeProvider();
var parameters = new System.CodeDom.Compiler.CompilerParameters()
{
GenerateExecutable = false,
GenerateInMemory = true,
};
var code = @"
namespace Tmp
{
public class TmpClass
{
public static string GetValue()
{
return """ + input + @""";
}
}
}";
var compileResult = provider.CompileAssemblyFromSource(parameters, code);
if (compileResult.Errors.HasErrors)
{
throw new ArgumentException(compileResult.Errors.Cast<System.CodeDom.Compiler.CompilerError>().First(e => !e.IsWarning).ErrorText);
}
var asmb = compileResult.CompiledAssembly;
var method = asmb.GetType("Tmp.TmpClass").GetMethod("GetValue");
return method.Invoke(null, null) as string;
}