Monday, April 13, 2009
In the perspective of upsizing my current Access 2007 application, I have been trying to understand a bit more about the possible performance impact of various choices of Primary Keys.
My problem is that currently, the Access application uses autoincrement numbers as surrogate Primary Keys (PK). Since I will need to synchronise the data over multiple remote sites, including occasionally disconnected clients, I can’t use the current autoincrement PK and will need to change to GUID.
To see for myself what could be the impact, I made a series of benchmarks.
This first part is fairly simple:
- Populate a Product table that contains 3 fields:
Designationwith 1,000,000 records.
- Test natively on SQL Server and Access 2007.
- The records are inserted in transactions batches of 1000 records.
- I collect the time taken for each of these transactions and plot it.
Nothing much to say about that:
All tests are performed on a dedicated Windows Server 2008 x64 rack running Access 2007 and SQL Server 2008 Standard (SP1) x64.
In SQL Server, we created a database with two tables
For the table using a GUID, we use the
NewSequentialID() instead of
NewID() to create new keys. This is supposed to offer much better performance as the generated GUIDs are guaranteed to be sequential rather than random, resulting in better index performance on insertion.
For the Access version of the tables, we basically use the same definition, except that we used 4 tables:
- ProductINT: let Jet/ACE autonumbering create the sequential integer Primary Key.
- ProductINTRandom: let Jet/ACE autonumbering create the random integer Primary Key.
- ProductGUIDRandom: let Jet/ACE use its own internal
GenGUID()for the key which generates random GUIDs instead of sequential ones.
- ProdcutGUIDSequential: call the Windows API (UuidCreateSequential) to create sequential ID instead.
SQL Server Test code
Using the SQL Server Management Studio, we performed the following test once for each table (resetting the database in-between):
Basically, we perform 1000 transactions each inserting 1000 records into the table
Access 2007 Test code
To duplicate the same conditions, the following VBA code will perform 1000 transactions each inserting 1000 records.
Note that the recordset is opened in Append mode only.
The importance of this will be discussed in another article.
We call this code to perform inserts on each of our Access tables:
- ProductINT table: we just insert data in the
ProductINTtable, letting Access create autonumber IDs.
- ProductINTRandom table: we just insert data in the
ProductINTRandomtable, letting Access create random autonumber IDs.
- ProductGUIDRandom table: we let Access create the Random GUID for the primary key.
- ProductGUIDSequential: we use the Windows API to create a sequential ID that we insert ourselves.
Without further ado, here are the raw results, showing the number of inserted record per second that we achieve for each test over the growing size of the database (here are only shown tests comapring Sequantial GUID and Autoincrement on SQL Server and Access, see next sections for the other results):
What we clearly see here is that performance when using autoincrement and Sequential GUID stays pretty much constant over the whole test.
That’s good new as it means that using Sequential GUIDs do not degrade performance over time.
As a side note, in this particular test, Access offers much better raw performance than SQL Server. In more complex scenarios it’s very likely that Access’ performance would degrade more than SQL Server, but it’s nice to see that Access isn’t a sloth.
Using Sequential GUID vs Autoincrement in Access
The results show that we do take a performance hit of about 30% when inserting Sequential GUID vs just using autonumbers.
We’re still getting good results, but that’s something to keep in mind.
In terms of CPU consumption, here is what we get:
Random PK, whether they are simple integer or GUID do consume substantially more CPU resources.
Using Sequential GUID vs Identity in SQL Server
Out-of-the box, SQL Server performs quite well and there is not much difference whether you’re using Sequential GUIDs or autoincrement PK.
There is however a surprising result: using Sequential GUIDs is actually slightly faster than using autoincrement!
There is obviously an explanation for this but I’m not sure what it is so please enlighten me :-)
Using Random GUID vs Sequential GUID vs Random Autonumber in Access
So, what is the impact of choosing a Sequential GUID as opposed to letting Access create its own random GUIDs?
It’s clear that random GUIDs have a substantial performance impact: their randomness basically messes up indexing, resulting in the database engine having to do a lot more work to re-index the data on each insertion.
The good thing is that this degradation is pretty logarithmic so while it degrades over time, the overall performance remains pretty decent.
While GUIDs are larger than Random Integers (16 bytes vs 4 bytes) the actual performance of inserting records whose PK is a random integrer is actually worse than random GUID…
Here we’ve check the baseline for our performance tests. In the next article, we’ll look exclusively at the performance of inserting data from a remote Access 2007 front end using our VBA code.
Having this baseline will allow us to check the performance overhead of using ODBC and letting Jet/ACE manage the dialogue with the SQL Server backend.
Feel free to leave your comments below, especially if you have any resources or information that would be useful.
- 16APR2009: added test of random autonumber as PK in Access.
- 13APR2009: Original Article