Wednesday, January 20, 2010

IDE Regex Replace: char to wchar_t string literals

While upgrading your apps to use wchar_t* instead of char* string literals, you'll find that you need to change a string such as "This is a string" to _T("This is a string"), as well as character literals such as 'c' to _T('c').

Well the good news is there's a quick way of doing this.

The C++ Builder IDE has always have a Regex (Regular Expression) based search and replace function. All you have to do is enable it in the Replace Text dialog, under Options | Regular expressions.

These are the corresponding Regex you'll need.

For string literals,

Text to find: "{(\\"|[^"])*}" (include the double quotes)
Replace with: _T("\0")

For char literals,

Text to find: \'{\\[^']|[^']}\'
Replace with: _T('\0')

* Note: Do not blindly replace all. You may end up replacing the text inside a string, such as "I can see 'u' from here". If anyone has any suggestions on how to correct this, I'd appreciate it (note that the IDE regex replacer does not support backreference). You may end up replacing strings that aren't string literals, such as #include "myfile.h".

The reason you'd want to use the _T(x) macro is because it's faster when you do an assignment to UnicodeString (which is typedef'd to String). The _T(x) macro maps to L##x - i.e. _T("text") == L"text". The String and _T(x) macro pair is compatible going from a compiler that supports Unicode to one that doesn't as String will map to UnicodeString in the former and AnsiString in the latter, which is the same for the _T(x) macro mapping to L (L"string") and nothing ("string") respectively.

String fromAnsi = "text";
_UStrFromPChar, which ends up calling MultiByteToWideChar. It is a Windows API that converts Ansi strings to Unicode strings, and as fast as it may be, it's bound to be slower than a straight memory copy.

String fromUnicode = L"text";
All else being equal (allocate memory and finding string length), this is much faster as it's basically just a straight memory copy.

2 comments:

Unknown said...

Thank you so much for your RegEx :-) It made my life more comfortable!

Mordachai said...

In VS 2012, this works for both strings and char literals:

Search: ((\".+?\")|('.+?'))
Replace: _T($&)