Sunday, July 11, 2010

Admin: Linux file server performance boost

Linux Using a Linux for an office file server is a no-brainer: it’s cheap, you don’t have to worry about unmanageable license costs and it just works.

Default settings of most Linux distributions are however not optimal: they are meant to be as standard compliant and as general as possible so that everything works well enough regardless of what you do.

For a file server hosting large numbers of files, these default settings can become a liability: everything slows down as the number of files creeps up and it makes your once-snappy fileserver as fas as a sleepy sloth.

There are a few things that we can do to ensure we get the most of our server.

Checking our configuration

First, a couple of commands that will help us investigate the current state of our configuration.

  • df will give us a quick overview of the filesystem:

    # df -T
    Filesystem    Type   1K-blocks      Used Available Use% Mounted on
    /dev/md2      ext3    19840804   4616780  14199888  25% /
    tmpfs        tmpfs      257580         0    257580   0% /dev/shm
    /dev/md0      ext3      194366     17718    166613  10% /boot
    /dev/md4      ext3     9920532   5409936   3998532  58% /var
    /dev/md3      ext3      194366      7514    176817   5% /tmp
    /dev/md5      ext3    46980272  31061676  13493592  70% /data
    
  • tune2fs will help us configure the options for each ext3 partition. If we want to check what is the current configuration of a given partition, says we want to know the current options for our /data mount:

    # tune2fs -l /dev/md5
    

    If I was using LVM as a Volume manager, I would type something like:

    # tune2fs -l /dev/VolGroup00/LogVol02
    

    This would give lots of information about the partition:

    tune2fs 1.40.2 (12-Jul-2007)
    Filesystem volume name:   <none>
    Last mounted on:          <not available>
    Filesystem UUID:          d6850da8-af6f-4c76-98a5-caac2e10ba30
    Filesystem magic number:  0xEF53
    Filesystem revision #:    1 (dynamic)
    Filesystem features:      has_journal resize_inode dir_index filetype 
                              needs_recovery sparse_super large_file
    Filesystem flags:         signed directory hash
    Default mount options:    user_xattr acl
    Filesystem state:         clean
    Errors behavior:          Continue
    ....
    

    The interesting options are listed under Filesystem features and Default mount options. For instance, here we know that the partition is using a journal and that it is using the dir_index capability, already a performance booster.

  • cat /proc/mounts is useful to know the mounting options for our filesystem (just listed some interesting ones here):

    rootfs / rootfs rw 0 0
    /dev/root / ext3 rw,data=ordered 0 0
    /dev/md0 /boot ext3 rw,data=ordered 0 0
    /dev/md4 /var ext3 rw,data=ordered 0 0
    /dev/md3 /tmp ext3 rw,data=ordered 0 0
    /dev/md5 /data ext3 rw,data=ordered 0 0
    none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
    /dev/md4 /var/named/chroot/var/run/dbus ext3 rw,data=ordered 0 0
    

    The data=ordered mount parameter tells us of the journaling configuration for the partition.

Journaling

So what is journaling?
It’s one of the great improvements of ext3: a journal is a special log on the disk that keeps track of changes about to be made. It ensures that, in case of failure, the filesystem can quickly recover without loss of information.

There are 3 settings for the journalling feature:

  • data=journal the most secure but also slowest option since all data and metadata is written to disk: the whole operation needs to be completed before any other operation can be completed. It’s sort of going to the bank for a deposit, filling the paperwork and making sure the teller puts the money in the vault before you leave.
  • data=ordered is usually the default compromise: you fill-in the paperwork and remind the teller to put the money in the vault asap.
  • data=writeback is the fastest but you can’t be absolutely sure that things will be done in time to prevent any loss if a problem occurs soon after you’ve asked for the data to be written.

In normal circumstances all 3 end-up the same way: data is eventually written to disk and everything is fine.
Now if there is a crash just as the data was written only option journal would guarantee that everything is safe. Option ordered is fairly safe too because the money should be in the vault soon after you left; most systems use this option by default.

If you want to boost your performance and use writeback you should make sure that:

  • you have a good power-supply backup to minimise the risk of power failure
  • you have a good data backup strategy
  • you’re ok with the risk of losing the data that was written right before the crash.

To change the journaling option you simply use tune2fs with the appropriate option:

# tune2fs -o journal_data_writeback /dev/md5

Mount options

Now that we’ve changed the available options for our partition, we need to tell the system to use them.
Edit /etc/fstab and add data=writeback to the option columns:

/dev/md5     /data    ext3    defaults,data=writeback   1 2

Next time our partition is mounted, it will use the new option. For that we can either reboot or remount the partition:

# mount - o remount /data

noatime option

There is another option that can have a very dramatic effect on performance, probably even more than the journaling options above.

By default, whenever you read a file the kernel will update its last access time, meaning that we end up with a write operation for every read!
Since this is required for POSIX compliance, almost all Linux distributions leave this setting alone by default.
For a file server, this can have such drastic consequence on performance.

To disable this time-consuming and not useful feature (for a file server), simply add the noatime option to the fstab mount options:

/dev/md5     /data    ext3    defaults,noatime,data=writeback   1 2

Note that updating access times is sometimes required by some software, such as mail software (such as mutt). If you properly keep your company data in a dedicated partition, you can enable the mount options only for that partition and keep other options for the root filesystem.

dealing with errors in fstab

After doing the above on one of the servers, I realized that I made a typo when editing /etc/fstab.
This resulted in the root filesystem being mounted read-only, making fstab impossible to edit…

To make matters worse, this machine was a few thousand miles away and could not be accessed physically….

Remounting the root filesystem resulted in errors:

# mount -o remount,rw /
mount: / not mounted already, or bad option

After much trial and rebooting, this worked (you need to specify all mounting options, to avoid the wrong defaults from being read from etc/mtab`):

# mount  -o rw,remount,defaults /dev/md2 /

After that, I could edit /etc/fstab and correct the typo…

Conclusions

How much these options will improve performance really depends on how your data is used: the improvements should be perceptible if your directories are filled with large amounts of small files.
Deletion should also be faster.

Bookmark and Share

Add comment   |   Filed under :  Linux,sysadmin

Wednesday, September 9, 2009

Access: checking blank variables

Microsoft Access I often have to test String, Variant or Object variables that have no content and could be considered ‘blank’.

The problem is that testing for “blankness” can mean many different things to different types:

  • For an Object type, the variable can be Nothing.
  • For a String type, the string can have no content at all: "", vbNullString.
  • For a Variant type, the string can have any of the following attributes or values:
    • it can be Missing if the variable is an unused optional parameter,
    • it can be Empty if it was never assigned,
    • it can be Null if, for instance it’s bound to a nullable field or unbound with no value,
    • it can be an empty string "", vbNullString.

When having to check these variables in code, it can be tiresome to have to go through testing some of these possibilities just to find out that your variable does or not not contains something useful, regardless of the type of variable you are using.

To avoid having to do all these tests, make the code a bit more tidy and allow me to move on to more important things, I use this small utility function quite often:

So now I don’t have to worry so much about the type of the variable I’m testing when I want to know if it contains useful data:

Obviously, IsBlank() doesn’t replace the other tests but I found it to be more straightforward to use in most cases.

Bookmark and Share

1 comment   |   Filed under :  Database,MSAccess,Programming

Wednesday, August 26, 2009

A story about exceptional service

security01.pngRecently I found myself constrained by the puny 200GB of my Mac Book Pro and I bought a 500GB Seagate drive to replace it (a fast 7200 rpm one).
The Macbook Pro has no easy access for the drive so you have to resort to dismantling the case to access it. This put me off replacing the drive because I would probably be voiding the warranty and was running the risk of damaging this expensive piece of equipment.

I’ve been filling the drive with pictures from my recent camera purchase and I couldn’t put it off any longer, so I bought the new drive and went online to find some good tutorial on how to crack open the Macbook Pro case.

After a few searches, I noticed that many people were referring to the iFixit.com website. It was very easy to find the tutorial I was looking for, I didn’t have to register, and each step was made very clear and simple.
It took no time to open the case and replace the drive.
I was very happy with that find.

Now, that’s not the end of the story.

A couple of days before I replaced the drive the left fan of the laptop suddenly became noisy. This would happen a few times a day, at random, and would last 10-20 minutes.
My only solution to get this repaired was to get to the local Apple service shop. Even though I knew exactly which part number was to be replaced, they still wanted me to:

  • go across town to visit them so they could see for themselves what the problem was: annoying because the problem was intermittent so I may have to go for nothing.
  • wait for the part to arrive a few days later.
  • go back to leave the laptop
  • go again to collect the repaired laptop the next day or so. So all in all: about 6h spend travelling back and forth + no laptop for a couple of day + the risk that some indiscreet technician start looking through my personal stuff.

Instead, I went back to the iFixit website:

  • identified my machine
  • found out the list of spare parts available from their store
  • added the fan to my cart
  • paid for it.
  • found a guide that showed how to replace the part.

That took me all of 10 minutes; I placed my order on Thursday and the next Monday I received the part … halfway across the globe!

I also got a survey request from iFixit and left some comments, from which I got back two nice detailed email follow-ups, one from the CEO saying they were implementing my remarks as part of their site improvement efforts.

Well, I thought I would share this story. It’s not that often that you get excited by an online vendor that not only does its job well but goes beyond expectations.

Bookmark and Share

Add comment   |   Filed under :  Business,Hardware,Reviews,sysadmin

Thursday, June 11, 2009

Access: Run-time Error 3155 ODBC insert on a linked table failed

Microsoft Access I have been spending a lot of time trying to find out why some of the code used to insert new records into a linked SQL Server table would systematically fail with an error:

Run-time Error '3155' ODBC--insert on a linked table  failed

It was driving me mad.
I could insert a simple record using SQL Server Management Studio, I could add new records to the table in datasheet mode within Access, but as soon as I tried to insert a record from code, whether using DAO recordset or executing the same SQL INSERT, it would miserably fail.

After a fair bit of investigation and tests, of which you can read the full account on the question I asked on StackOverflow, it turns out that this is a long-standing bug in the ODBC Driver (or Access).

Memo fields in Access are usually translated into nvarchar(MAX) in SQL Server by tools like SSMA.
Unfortunately, when you link tables having these fields using the SQL Server Client driver, these fields get incorrectly interpreted as string, even though they appear ok from the table design view.
It’s only if you try to insert something into the field, either text larger than 255 chars or NULL, that you get the error message.

So, the solution, at least in this case, is to revert to the older SQL Server ODBC driver instead, or use varchar() instead of nvarchar(), but if you’re dealing with Unicode, you have to stick with nvarchar().

References

Bookmark and Share

1 comment   |   Filed under :  Database,MSAccess,Programming,SQL Server

Monday, April 13, 2009

Access vs SQL Server: some stats (part 1)

Microsoft Access In the perspective of upsizing my current Access 2007 application, I have been trying to understand a bit more about the possible performance impact of various choices of Primary Keys.

My problem is that currently, the Access application uses autoincrement numbers as surrogate Primary Keys (PK). Since I will need to synchronise the data over multiple remote sites, including occasionally disconnected clients, I can’t use the current autoincrement PK and will need to change to GUID.

To see for myself what could be the impact, I made a series of benchmarks.
This first part is fairly simple:

  • Populate a Product table that contains 3 fields: ID, SKU and Designation with 1,000,000 records.
  • Test natively on SQL Server and Access 2007.
  • The records are inserted in transactions batches of 1000 records.
  • I collect the time taken for each of these transactions and plot it.

Test setup

Nothing much to say about that:

All tests are performed on a dedicated Windows Server 2008 x64 rack running Access 2007 and SQL Server 2008 Standard (SP1) x64.

Test database

In SQL Server, we created a database with two tables ProductGUID and ProductInt:

For the table using a GUID, we use the NewSequentialID() instead of NewID() to create new keys. This is supposed to offer much better performance as the generated GUIDs are guaranteed to be sequential rather than random, resulting in better index performance on insertion.

For the Access version of the tables, we basically use the same definition, except that we used 4 tables:

  • ProductINT: let Jet/ACE autonumbering create the sequential integer Primary Key.
  • ProductINTRandom: let Jet/ACE autonumbering create the random integer Primary Key.
  • ProductGUIDRandom: let Jet/ACE use its own internal GenGUID() for the key which generates random GUIDs instead of sequential ones.
  • ProdcutGUIDSequential: call the Windows API (UuidCreateSequential) to create sequential ID instead.

SQL Server Test code

Using the SQL Server Management Studio, we performed the following test once for each table (resetting the database in-between):

Basically, we perform 1000 transactions each inserting 1000 records into the table ProductGUID or ProductINT.

Access 2007 Test code

To duplicate the same conditions, the following VBA code will perform 1000 transactions each inserting 1000 records.
Note that the recordset is opened in Append mode only.
The importance of this will be discussed in another article.

We call this code to perform inserts on each of our Access tables:

  • ProductINT table: we just insert data in the ProductINT table, letting Access create autonumber IDs.
  • ProductINTRandom table: we just insert data in the ProductINTRandom table, letting Access create random autonumber IDs.
  • ProductGUIDRandom table: we let Access create the Random GUID for the primary key.
  • ProductGUIDSequential: we use the Windows API to create a sequential ID that we insert ourselves.

Test results

Without further ado, here are the raw results, showing the number of inserted record per second that we achieve for each test over the growing size of the database (here are only shown tests comapring Sequantial GUID and Autoincrement on SQL Server and Access, see next sections for the other results):

Inserts per second

What we clearly see here is that performance when using autoincrement and Sequential GUID stays pretty much constant over the whole test.
That’s good new as it means that using Sequential GUIDs do not degrade performance over time.

As a side note, in this particular test, Access offers much better raw performance than SQL Server. In more complex scenarios it’s very likely that Access’ performance would degrade more than SQL Server, but it’s nice to see that Access isn’t a sloth.

Using Sequential GUID vs Autoincrement in Access

The results show that we do take a performance hit of about 30% when inserting Sequential GUID vs just using autonumbers.
We’re still getting good results, but that’s something to keep in mind.

In terms of CPU consumption, here is what we get:

CPU load Access

Random PK, whether they are simple integer or GUID do consume substantially more CPU resources.

Using Sequential GUID vs Identity in SQL Server

Out-of-the box, SQL Server performs quite well and there is not much difference whether you’re using Sequential GUIDs or autoincrement PK.

There is however a surprising result: using Sequential GUIDs is actually slightly faster than using autoincrement!

There is obviously an explanation for this but I’m not sure what it is so please enlighten me :-)

CPU Consumption:

CPU load SQL Server

Using Random GUID vs Sequential GUID vs Random Autonumber in Access

So, what is the impact of choosing a Sequential GUID as opposed to letting Access create its own random GUIDs?

Inserts per second Random GUID vs Sequential GUID in Access

It’s clear that random GUIDs have a substantial performance impact: their randomness basically messes up indexing, resulting in the database engine having to do a lot more work to re-index the data on each insertion.
The good thing is that this degradation is pretty logarithmic so while it degrades over time, the overall performance remains pretty decent.
While GUIDs are larger than Random Integers (16 bytes vs 4 bytes) the actual performance of inserting records whose PK is a random integrer is actually worse than random GUID…

Provisional conclusions

Here we’ve check the baseline for our performance tests. In the next article, we’ll look exclusively at the performance of inserting data from a remote Access 2007 front end using our VBA code.

Having this baseline will allow us to check the performance overhead of using ODBC and letting Jet/ACE manage the dialogue with the SQL Server backend.

Feel free to leave your comments below, especially if you have any resources or information that would be useful.

Updates

  • 16APR2009: added test of random autonumber as PK in Access.
  • 13APR2009: Original Article
Bookmark and Share

15 comments   |   Filed under :  Database,MSAccess,SQL Server

Previous Posts


about

This is a simple technical weblog where I dump thoughts and experiences from my computer-related world.
It is mostly focused on software development but I also have wider interests and dabble in architecture, business and system administration.
More About me…

my Twitter

my StackOverflow Profile
My StackOVerflow profile

Most Recent Posts

Categories

Links

Feeds