Posts filed under 'VBA'

VBA: Unicode Strings and the Windows API

Microsoft Access In this post, we talk about Unicode, about Strings, about how they sometimes become unexpectedly mangled when you pass them around the system and what we can do about it. We’ll see some Chinese ideograms, peek into memory and display some nonsense on screen in the process.

VBA likes Unicode

Internally, VBA handles strings in Unicode encoded as UTF-16. This means that each ‘character’ of the string uses at least a 16 bit WORD (2 bytes) of memory (not entirely true, but let’s keep it simple).

Below is the proof; it shows the memory content of a VBA string variable that contains the following sentence:

Chinese or the Sinitic language(s)
(汉语/漢語, pinyin: Hànyǔ; 中文, pinyin: Zhōngwén)
is the language spoken by the Han Chinese

If you see squares in place of the Chinese ideograms, then you are probably using an old version of Internet Explorer and do not have the Asian fonts installed: simply use the latest versions of Firefox or Chrome if you want to display them correctly (although you can still go through this post without seeing the exact glyphs).

In memory, the layout of this VBA string is:

Memory map of the VBA String

Each character of our string occupies 2 bytes: our first letter C whose hexadecimal code 0043 is located at address 0EED7974, then, 2 bytes later at address 0EED7976, we have our second character, h whose code is 0068 and so on. Note that Windows is a Little Endian architecture, so the lowest significant byte 68 appears first in memory, followed by the most significant byte 00 at the next address.

On the right side, the map displays strange characters in place of the Chinese ones because the memory map interprets each byte as a simple ASCII character, but you can see for instance that the ideogram is located at address 0EED79EA and that its code is 4E2D .

The character codes we see here are simply the ones that would be returned by the AscW() function in VBA (more or less).

VBA doesn’t always like Unicode

The thing though, is that VBA considers the outside world to be ANSI, where characters take (generally) 1 byte and strings must be interpreted according to a Code Page that translate the character code into a different visual representation depending on the encoding.

This means that if you are Greek, your system’s code page will be 737 and you will see Ω (the capital letter Omega) if a string contains the hexadecimal code 97 whereas on a system set for western languages, including English, you will see ù: different representations of the exact same character code.

This nightmare of ANSI encoding made it very hard to pass around strings if you did not know which code page was associated with them. Even worse, this makes it very hard to display strings in multiple languages within the same application.

The Office VBE IDE editor is one such old beast: it only support the input and display of characters in your current system’s Code Page and it will simply display garbage or ? placeholders for any characters it can’t understand.

Fortunately, we’re not really living in an ANSI world any more. Since the days of Windows NT and Windows XP SP2, we’ve been able to use Unicode where each possible character, glyph, symbol has its own code point (things can be quite complicated in Unicode world as well, but that’s the topic for another post).

Unfortunately, VBA still inherits some old habits that just refuse to die: there are times when it will convert your nice Unicode strings to ANSI without telling you.

To make the situation worse, most of the VBA code on the Internet was written by English speakers at a time when Unicode was just being implemented (around 2000). The result is that even Microsoft’s own examples and Knowledge Base articles still get things wrong and copying these examples blindly will probably make you, your customers and users very unhappy at some point.

To understand this unwanted legacy, and the solutions to avoid these problems, we’ll go through a concrete example.

Message in a box

Let’s study a simple case: display our previous message using the MessageBox Windows API. The result we want to achieve is this:

Text containing Chinese properly displayed in a message box

Note that in order to display Chinese ideograms, your system must have the proper fonts installed. If not, you’ll end up with little boxes where the glyphs should be displayed.

What we are talking about here is not limited to Asian languages though: it concerns all languages, including English if you ever include any symbols that’s outside of extended ASCII, like the Euro €, mathematical symbols, words in other alphabets like Greek, phonetics, accents, even Emoji (emoticon) symbols.

First attempt: MessageBoxA

OK, let’s go through the motions and try to use the MessageBox API.

Let’s first make a proper declaration that will work with all versions of Office:

#If VBA7 Then
Public Declare PtrSafe Function MessageBoxA Lib "user32" _
(ByVal hwnd As LongPtr, _
ByVal lpText As String, _
ByVal lpCaption As String, _
ByVal wType As Long) As Long
Public Declare Function MessageBoxA Lib "user32" _
(ByVal hwnd As Long, _
ByVal lpText As String, _
ByVal lpCaption As String, _
ByVal wType As Long) As Long
#End If

Now, let’s try to display the message box with our text.

Dim s As String
s = GetUnicodeString()
MessageBoxA 0, s, "test", 0

The GetUnicodeString() function simply returns a string containing the text we want to display. It uses a function UnEscStr() from a previous article to convert escaped Unicode sequences into actual characters.

What we obtain is not what we expected:

MessageBoxA does not display the Chinese characters in our string

All Chinese characters have been replaced by ? placeholders.

Ok, so, if we look again at the signature of the function, we’re calling MessageBoxA. What does this A tacked at the end means? The Windows API has generally 2 version of functions that handle strings: the old ANSI version, ending with a A, and the Unicode version, ending with a W, for Wide-Characters.

>Our first tip is: whenever you see a WinAPI declaration for a function ending in A, you should consider the W version instead. >Avoid A WinAPI functions like the plague!

Get it right: MessageBoxW

So, now we know that we must use MessageBoxW for Unicode, the API documentation says so as well, so we must be on the right track.

Let’s declare the right API function then:

#If VBA7 Then
Public Declare PtrSafe Function MessageBoxW Lib "user32" _
(ByVal hwnd As LongPtr, _
ByVal lpText As String, _
ByVal lpCaption As String, _
ByVal wType As Long) As Long
Public Declare Function MessageBoxW Lib "user32" _
(ByVal hwnd As Long, _
ByVal lpText As String, _
ByVal lpCaption As String, _
ByVal wType As Long) As Long
#End If

The only change is the W in the function name and we’ll call the Unicode version. Let’s try to open that box again:

Dim s As String
s = GetUnicodeString()
MessageBoxW 0, s, "test", 0

And this displays:

Oh dear. Looks like Chinese, but it certainly isn’t what we expected. Here we’ve just got Mojibake, garbage due to incorrect processing of Unicode.

What could be wrong? We did pass a proper Unicode string, so what happened?

Well, VBA happened: whenever you pass a String as a parameter in a Declare statement, VBA will convert it to ANSI automatically.

>Let me repeat this: when you see a Declare statement that uses As String parameters, VBA will try to convert the string to ANSI and will likely irremediably damage the content of your string.

I’ll show you what happens by going back to our memory map. This is the content of memory before we call MessageBoxW:

String in memory with Unicode characters intact

This is the memory of the same string after the call to MessageBoxW:

String in memory with Unicode characters converted to placeholders

The string has been converted to ANSI, then converted back to Unicode to fit into 2-byte per character but in the process all high Unicode code points have been replaced by ? (whose code is 003F).

Note that the memory location of the string before and after the call are different. This is normal since the string was converted, so another one was allocated (sometimes they end-up occupying the same memory though it’s probably an optimisation of the interpreter).

In short:

Dim s As String
s = ChrW(&H4E2D)
' s now contains the 中 character
MessageBoxW 0, s, "test", 0
' s now contains the ? character instead

So our string was actually returned modified to us; we can’t even rely on VBA to keep our original string intact…

Get it right, use pointers!

So, it’s clear now, whenever VBA sees that you are passing a String in a Declare statement, it will silently convert that String to ANSI and mangle any characters that doesn’t work in the current Code Page set for the system.

To avoid that unwanted conversion, instead of declaring the parameters as String, we will declare pointers instead so that our final declaration will be:

#If VBA7 Then
Public Declare PtrSafe Function MessageBoxU Lib "user32" Alias "MessageBoxW" _
(ByVal hwnd As LongPtr, _
ByVal lpText As LongPtr, _
ByVal lpCaption As LongPtr, _
ByVal wType As Long) As Long
Public Declare Function MessageBoxU Lib "user32" Alias "MessageBoxW" _
(ByVal hwnd As Long, _
ByVal lpText As Long, _
ByVal lpCaption As Long, _
ByVal wType As Long) As Long
#End If

What we did here is simply define MessageBoxU as an alias for MessageBoxW which we know is the right API function. Then we replaced the declaration for ByVal lpText As String and ByVal lpCaption As String, by pointers values: ByVal lpText As LongPtr and ByVal lpCaption As LongPtr.

From the point of view of the MessageBoxW function, in both instances it will receive a pointer, except that instead of a String, itwill receive the value of a LongPtr, which is functionally identical.

Now, when using our new MessageBoxU function, we also need to pass the value for the pointers to the strings explicitly:

Dim s As String
s = GetUnicodeString()
MessageBoxU 0, StrPtr(s), StrPtr("test"), 0

The StrPtr() function has been part of VBA for a while now but it is still not documented. It simply return the memory address of the first character of a string as a LongPtr (or Long versions of Office older than 2010).

There are other functions to get the address of a variable: VarPtr() returns the memory address of the given variable while ObjPtr() returns the memory address of an object instance. We’ll have to talk more about those in another post.

So, now, the result:

Text containing Chinese properly displayed in a message box

Hurray! We did it! No unexpected conversion! If you looked at the memory map, there would be nothing really interesting to see: the string would have stayed the same throughout the call and would be untouched: no conversion, no copy.

Not only did we manage to make call to API functions work, they are also faster because VBA doesn’t have the overhead of these silly Unicode-ANSI-Unicode conversions every time we use the API function.

What about the built-in VBA MsgBox?

VBA has had its own MsgBox function for a while, so why not use it?

Dim s As String
s = GetUnicodeString()
MsgBox s, , "test"

Well, that’s why you should not:

MessageBoxA does not display the Chinese characters in our string

Unfortunately, the VBA function MsgBox this is just a wrapper for the MessageBoxA API call, so no help there.

Parting words

This post has not been about Message boxes, although you will probably find my Enhanced Message Box replacement useful if that is what you are looking for.

The point of all this is simply to remember the following when involving Strings in Win API calls:

  1. Do not use the A version of API calls, always use the W version instead.

  2. This is also relevant for some of the built-in VBA functions: use ChrW$() over Chr() and AscW() over Asc(), both W version handle Unicode characters and are about 30% faster in my tests (see KB145745 for more information).

  3. Check all code involving Strings with some real-world examples containing Unicode characters beyond the usual Extended ASCII set.

  4. VBA will mess-up and convert to ANSI all strings parameters in a Declare statement.

  5. To avoid this mangling, use pointers instead of Strings in your declarations: never declare As String in an API Declare statement. Always use ByVal As LongPtr instead and then use StrPtr() in your calls to pass the pointer to the actual string data.

  6. Similarly, when passing a User-Defined-Type structure that contains Strings to a Declare statement parameter, declare the parameter as ByVal As LongPtr and pass the UDT pointer with VarPtr().

  7. Remember that the VBE IDE cannot process and display Unicode, so don’t expect your debug.print to display anything other than ? for high code point characters.

6 comments October 1st, 2013

VBA for 32 and 64 bit systems

Microsoft Access I’ve put off writing about x64 MS Office because I was a bit lazy and hadn’t build the necessary environment to test it properly.

In my day-to-day job -as far as Access is concerned-, I have so far targeted the 32 bit (x86) version of Office 2010 but the 64 bit version is bound to become more popular with time and some people already have to use it, even if sometimes their reasons are misguided.

When should I care?

If your code has any chance of being run on Office 2010 and later, and you want to avoid issues when one of your users or customers will have Office x64 (or just one of its components) installed, then you should make sure to use your code uses the new VBA7 constructs.

The changes to be performed only affect the parts of your code that use Win API calls or manipulate Windows control handles and memory pointers.

For a small fraction of applications, actually using Access/Excel x64 could make sense as it allows you to deal with 64 bit integers (as opposed to the standard 32 bit long). While this could be a useful feature, code that use these 64 bit integers will not compile on a 32 bit version of Office.

If you use external libraries and controls

If you are using external libraries and ActiveX controls, it’s unlikely that they will only work in Office x64. The reason is that most libraries (especially older ones) are only available in 32 bits.

For these to work under Office x64, they would have to be compiled especially for 64 bit as well.

So, the bottom line is: do not use external libraries and ActiveX components (not even the ones that usually come with Office!) unless you can either:

  • completely control the user environment an ensure it will always run Office x86 (32 bit).
  • you can compile or get 64 bit versions of these libraries and ActiveX components so you can build a version of your application that will work for all your users.

If your code targets Access 2010 and later

There are 2 changes that may affect your code: external library declarations and usage of the new pointer and very large integer types.

Library declarations:

Any declaration for external libraries (such as Win API or code that resides in DLLs), the declaration statement has changed and must use the new PtrSafe keyword to tell the compiler that the declaration should be interpreted in a 64 bit context when relevant. If you omit the keyword, your declarations will always be interpreted as being in a 32 bit context and the results may be random.

For instance, the old declaration for the SetFocus Win API function was:

    Private Declare Function WinAPISetFocus Lib "user32" _
                    Alias "SetFocus" (ByVal hwnd As Long) As Long

Now it is:

    Private Declare PtrSafe Function WinAPISetFocus Lib "user32" _
                    Alias "SetFocus" (ByVal hwnd As LongPtr) As Long

Note how the hwnd parameter is now a LongPtr instead of just a Long.

New types:

You will need to replace declarations that use long with the more appropriate type:

  • LongPtr
    Use the new LongPtr wherever you deal with declarations that relate to Windows control handles or memory locations.

  • LongLong
    Defines a 64 bit integer.
    The reason for LongLong is mainly to access some 64 bit versions of the Windows API, in particular, for math functions.
    This type only exists if you are running under a 64 bit version of Access. On 32 bit systems, it will cause a compile error. To avoid that problem where your code needs to run on both 32/64 bit platforms, you can use the new conditional constant Win64:

        #If WIN64 Then
            Dim largeInt As LongLong  ' Declares a 64 bit integer
            Dim largeInt As Long      ' Declares a standard 32 bit integer
        #End If

The compiler will only see and compile one code path and will ignore the other, so doing this does not affect the speed of your application.

If your code must also work in Access 2007 and earlier

Older versions of Access will not be able to compile the new PtrSafe, LongPtr and LongLong keywords introduced by VBA7. If your code must also work in older versions of Access, you must either create 2 versions of your application (not very convenient), or simply use the new conditional constant VBA7 to separate blocks of code that will run on Office 2010 and later from blocks of code that will run on Office 2007 and ealier.

The new VBA7 conditional constant will evaluate to True if it is running under Office 2010 and later.

So now you can create variations of your code that will work everywhere:

    #If VBA7 Then
      Private Declare PtrSafe Function WinAPISetFocus Lib "user32" _
                      Alias "SetFocus" (ByVal hwnd As LongPtr) As Long
      Private Declare Function WinAPISetFocus Lib "user32" _
                      Alias "SetFocus" (ByVal hwnd As Long) As Long
    #End If

Adapting old code

When going through old code, you can’t just add PtrSafe to all API declarations and do a search to replace Long into LongPtr. That would be the surest way to get everything wrong.

Instead, you must be very careful about the intent of the code and understand what your variables are manipulating: is that long really just a number or is it mean to be a memory location or a control handle ?

To help you, at least as far as the most common Win API declarations are concerned, Microsoft has made available a text file with hundreds of such declarations compatible for Office x64: Win32API_PtrSafe with 64-bit Support

If you need to use a WinAPI call that isn’t in the list, you will need to have a look at its actual definition and, based on the type of data being passed around, decide if you need to use a simple Long or a LongPtr.

For instance, I had to use the MonitorFromWindow API function. Its C++ definition is as follow:

    HMONITOR MonitorFromWindow(
      _In_  HWND hwnd,
      _In_  DWORD dwFlags

The definition tells me:

  • That the function is taking a Handle (basically a pointer) to a Window and some flags
  • and that the function returns a Handle to the display monitor (basically a pointer).

So the API definition that will work on all versions of Access is:

    #If VBA7 Then
        ' Works in 32/64 bits of Office 2010 and later
        Declare PtrSafe Function MonitorFromWindow Lib "user32" _
                                 (ByVal hwnd As LongPtr, ByVal dwFlags As Long) As LongPtr
        ' Works in Office 2007 and earlier
        Declare Function MonitorFromWindow Lib "user32" _
                         (ByVal hwnd As Long, ByVal dwFlags As Long) As Long
    #End If

When using the external function, you also need to be careful in the code and use the VBA7 conditional to separate the parts where you access structure pointers:

    Private Sub Redimension()
        #If VBA7 Then
            Dim hMonitor    As LongPtr
            Dim hMonitor    As Long
        #End If
        hMonitor = MonitorFromWindow(Application.hWndAccessApp, &H0&)
        ... do stuff with hMonitor ...
    End Sub

Make sure you save, re-compile and test your project as you go along. Also make frequent backups of your application as a wrong declaration could crash and corrupt it.
It may even be possible to crash the whole machine, although I have not experienced that so far.


Updating old code to make it work properly on all versions of Access is not difficult. It is usually fairly easy, although it can be a bit cumbersome at times.

The trick is to know what your code really needs to target:

  • Office 2010 and later, the present and the future, in which case everything is straightforward and you use the new VBA7 keywords PtrSafe, LongPtr and LongLong where appropriate.
  • All versions of Access, in which case you will have sometimes to separate and maintain 2 versions of your code using the VBA7 conditional.


4 comments July 7th, 2013

Most Recent Posts



Posts by Month