Table of Content

Microsoft AccessWell, maybe the profit part is a bit of an exaggeration, but it will be fun, I promise, unless you’re not a geek in which case I really wonder why you are reading this blog.

I’m not going to shock you if I say that strings manipulations and formatting are really ugly in VBA.
Who has time to read and debug stuff like this?

    str = "Message to """ & recipientName & """" & vbCrLf
    str = str & "Line " & lineNum & " of " & lineCount & ", " & vbCrLf & _
          "Love you " & recipientName & "... " & ChrW$(10083)
    Debug.? str

Wouldn’t it be better just ot be able to write this instead ?

    str "Message to ""{0}""\n"
    str "Line {1} of {2},\n"
    str "Love you {0}... \u2763"
    Debug.? str.ToString(recipientName, lineNum, lineCount)

or, something more compact like:

    Debug.? sFormat("Message to ""{0}""\nLine {1} of {2},\nLove you {0}... \u2763", _
                    recipientName, lineNum, lineCount)

How about some simple date and number formatting?

    Debug.? sFormat("You earned ${1:C2} at {0:t} on {0:d}", now(), 2588.687)

To print this:

    You earned $2,588.69 at 17:40 on 10-09-2013

Or some Unicode sequences?

  Debug.? sFormat("Here be dragons \U0001F409 and Euro signs \u20ac")

Well, the dragon will probably not be visible, even in your browser it’s probably not well supported by your installed fonts, but at least you should see the Euro symbol.

.Net strings

It’s no secret that strings in .Net have a lot more capabilities when it comes to building and formatting.
You can do everything in VBA as well, but things tend to be more difficult to read and compose. In .Net, strings can be formatted in lots of different ways, including creating columns of fixed-length text, padding, aligning text to the left or the right, specifying precision for numbers, formatting dates, etc.

Strings can also contain special character literals to insert newlines or tabs for instance and special Unicode escape sequences meant to represent characters that would otherwise be impossible to include in a string by hand.

Character literals and Unicode escape sequences

Before we go deeper into how things works, let’s go through some of the helper functions you will find in the attached code.

The StringUtils.bas module contains the following function that takes a string that may contain any .Net special character literals and Unicode escape sequences and returns a string that will contain the un-escaped version:

    UnEscStr(str As Variant) As String

For instance, to include new lines in our strings, just include the \n literal. Adding a tab is simply a matter of using \t.
Note that if you want a backslash \ in your string, you will have to escape it to \\ otherwise the function will interpret the following characters in a way you may not expect.

You can also encode any Unicode character using its Unicode code-point expressed in hexadecimal (for instance, the code-point U+0041 represents the character A).

VBA uses UTF-16 for its string encoding, meaning that each character in a string is encoded using 16 bits and should therefore be capable of containing any possible glyph in the world.

However, to include Unicode values in strings is difficult in VBA:

  • You can’t just type the character in the string: the IDE is a remnant of the past and uses Extended ASCII only. Anything that cannot be displayed in the IDE will come out as ?.

  • You can use ChrW$() to include Unicode in your strings, but it is quite verbose and ends-up an ugly mess as you have to cut your string in bits to insert Unicode characters:

    message = "Chinese or the Sinitic language(s) (" & ChrW$(&H6C49) & _
              ChrW$(&H8BED) & "/" & ChrW$(&H6F22) & ChrW$(&H8A9E) & _
              ", pinyin: Hàny" & ChrW$(&H1D4) & "; " & ChrW$(&H4E2D) & _
              ChrW$(&H6587) & ", pinyin: Zh" & ChrW$(&H14D) & _
              "ngwén) is the language spoken by the Han Chinese"

    As a more elegant solution, you can use Unicode escape sequences directly in the strings and they will be converted through the UnEscStr() function.

To properly include a Unicode code-point in your string, you must follow some rules:

  • If the code-point is between U+0 and U+FFFF, use the \uXXXX sequence (lower case u followed by 4 hex digits).
    For instance to encode the € sign whose code-point is U+20AC, just include \u20ac in your string.

  • If the code-point is above 0+FFFF, you must use the \UXXXXXXXX (upper case U followed by 8 hex digits).
    For instance, to encode the dragon symbol whose code-point is U+1F409, you will have to use \U0001F409 in your string.

  • There is a third way to include a code-point sequence by using the \x notation.
    This is similar to \u but you do not have to use exactly 4 hex digits, you can use 1, 2, 3 or 4.
    For instance, instead of encoding the letter A as \u0041, you can simply use \x41.
    However, since this notation accepts a variable number of digits, you have to be careful: the string \x27Effect\x27 will not print 'Effect', because E and f are considered hex digits, you will end up with something that will look like )fect' instead.
    Better use the \u encoding, it’s what you’ll need in 99% of the cases anyway.

As an example, we can rewrite the code above a bit more cleanly:

    message = UnEscStr("Chinese or the Sinitic language(s) " & _
                       "(\u6C49\u8BED/\u6F22\u8A9E, pinyin: Hàny\u01D4; " & _
                       "\u4E2D\u6587, pinyin: Zh\u014Dngwén) " & _
                       "is the language spoken by the Han Chinese")

Would show this when displayed in the enhanced dialogbox:

Displaying Unicode Chars

String formatting

The StringUtils.bas module also contains an implementation of the .Net String.Format() method:

    sFormat(frmt As String, ParamArray args() As Variant) As String

In .Net, string formatting uses {x} place-holders inside the string itself. The place-holder {0} refers to the value of the first parameter, {1} to the value of the second, and so on.
Place-holders can also be used to define how the value should be formatted:

    Debug.? sFormat("You earned ${1:C2} at {0:t} on {0:d}", now(), 2588.687)

Since the sFormat() function internally uses UnEscStr(), you can of course include character literals and Unicode escape sequences as well.

    Debug.? sFormat("You earned:\n\t\u20ac{1:N}\nat:\n\t{0:t}\non:\n\t{0:d}", now(), 2588.687)

Will display:

    You earned:
        €2,588.69
    at:
        17:44
    on:
        10-09-2013

You will find examples on the String.Format help page and on all the associated pages that give formatting information for particular data:

The StringBuilder class

The other attached file StringBuilder.cls contains a class that loosely corresponds to the .Net System.Text.StringBuilder class.

That helper class is used to compose strings efficiently. It understands string formats and will also un-escape character literals and Unicode escape sequences.

The class is pretty straightforward to use:

    Dim s As StringBuilder
    Set s = New StringBuilder
    s "SELECT Reference "
    s "FROM StockEvent "
    s "WHERE  ((EventType = {0}) "
    s "        AND (POID = {1}) "
    s "        AND (AvailableQty > {2}));"
    resultString = s.ToString(EventNum, ID, 33.55)

In order to make the class less verbose, the Append() method of the StringBuilder is set as the default method for the class. So the code above is actually exactly the same as:

    Dim s As StringBuilder
    Set s = New StringBuilder
    s.Append "SELECT Reference "
    s.Append "FROM StockEvent "
    s.Append "WHERE  ((EventType = {0}) "
    s.Append "        AND (POID = {1}) "
    s.Append "        AND (AvailableQty > {2}));"
    resultString = s.ToString(EventNum, ID, 33.55)

Having the most common method as a default makes things a bit cleaner, I think.

The thing to remember, is that in our example, s is an instance of the class, it is not a string, it’s an object, so once you have completed your string composition, you must call ToString() to get the actual VBA string back.

The class is fairly flexible, allowing you to insert values in each formatted string or, as shown above, by adding the parameters in the call to ToString():

    s "SELECT Reference "
    s "FROM StockEvent "
    s "WHERE  ((EventType = {0}) ", EventNum
    s "        AND (POID = {0}) ", ID
    s "        AND (AvailableQty > {0}));", 33.55
    resultString = s.ToString()

You can also append numbers, dates, boolean directly to the string as you compose it:

    s "The date : "
    s Now()
    s "I hope it's not false: "
    s True
    s "I owe a lot of money: "
    s 12.55

When you want to display some text that is going to be printed or inserted in a file or textbox, you can use AppendLine() instead of Append()

    Public Sub ShowHelp()
        Dim s As New StringBuilder
        s.AppendLine "Super Secret Program - v{0}", progVersion
        s.AppendLine "(c) my secret self"
        s.AppendLine "Usage:"
        s.AppendLine " supsec [options] filename"
        Debug.? s.ToString()
    End Sub

Alternatively, you can set the special property AlwaysAppendCrLf to True, in which case, a newline will be added every time you call Append() :

    Public Sub ShowHelp()
        Dim s As New StringBuilder
        s.AlwaysAppendCrLf = True
        s "Super Secret Program - v{0}", progVersion
        s "(c) my secret self"
        s "Usage:"
        s" supsec [options] filename"
        Debug.? s.ToString()
    End Sub

You can also insert a string or numbers, etc., at any place within the string being built:

    s "Line1Line3"
    s.Insert 5, "\nLine{0}\n", 2

Or replace any substring by another one:

    s "Line1\nRow2\nLine3\nRow4"
    s.Replace "Row", "Line"

Or simply remove some characters from the string:

    s "Line1\nRow2\nLine3"
    s.Remove 5, 5

Instead of getting the whole string, you can get an portion of it:

    s.ToSubstring(4, 10) 

Or just return any character at any position in the current string:

    s "Line1\nRow2\nLine3"
    Debug.? s.Chars(6)  ' Will return R
    Debug.? s.Codes(6)  ' Will return 82, which is the ASCII/Unicode for R

If you want to re-use the StringBuilder object, you can simply clear it to remove all strings already added:

    s.Clear

A few more details:

  • All operations on the StringBuilder object return the object instance itself.
    This allows for chaining of operations like this:

        str = s("Line1\n")("Line3\n").Insert(5, "Line2\n").Replace("Line", "Row").ToString()
    
  • You can find out the current length of the string with the Length property.

  • Be careful that Replace() is case sensitive since all .Net operations are case sensitive.
    So Replace("line", "Row") is not the same as Replace("Line", "Row").

Using .Net strings in VBA

Using .Net from VBA can be a challenge depending on what part of .Net you want to use and how you want to use it from VBA.

The .Net String object itself is not accessible from VBA. You can’t create an instance of it directly from VBA and when you use an existing instance of a .Net string, it gets converted to a VBA string immediately.

However, there is a very useful .Net class that is already registered on you machine: System.Text.StringBuilder.

This .Net class is the workhorse behind the implementation of the StringBuilder class and the sFormat function in VBA.
Thanks to this easily instantiable object, we can use some of the power of .Net in our VBA string at no cost.

Some performance information

Of course, doing all that processing has a cost, although in the vast majority of cases, it should not be noticeable.

  • I timed the use of UnEscStr above in the string that contains Chinese Unicode characters.
    The call takes 0.010ms on my machine (a 4 year old i7)

  • A call to sFormat similar to the example given in this article took about 0.10ms.

This timing depends a lot of course on the complexity of the string, the type of formatting, the amount of literal characters an Unicode sequences to convert, etc.

So if you are formating 10,000s of strings, or un-escaping 100,000s or strings the processing time will probably add up to be noticeable but in most use cases, even if you have a few 1,000s of strings, processing will not impact performance in any noticeable way.

Code Updates

v1.0: 12SEP2013
Original release.

Code download

The 2 files contained in the archive below are all you need. Actually, if you are only interested in the sFormat() or UnEscStr() function you only need to import the StringUtils.bas module.

However, if you want to use the StringBuilder class, you must import both files.

To import the files, just open your Access application, open the IDE then, from the menu, File > Import File, and select the StringUtils.bas, then repeat for StringBuilder.cls.
That’s all there is to it.

Download Download the StringUtils.v1.0.zip (7KB), version 1.0 – 12SEP2013 containing the code files.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
Free for re-use in any application or tutorial providing clear credit is made about the origin of the code and a link to this site is prominently displayed where end-users can easily access it.

Last modified: Friday, 3 February 2023

Author

Comments

Sebastian Helm 

🙇 Thanks a lot, after searching for quite a while for a way to convert extended characters, I’m happy to have found this, which is even more universally useful. 🙆 Sebastian

Comments are closed.