Sunday 10 April 2011

Too many files in a directory hurting performance in Linux?

Some colleagues of mine have come to a situation where they believe that storing many files (in the order of hundreds of thousands) in a single UNIX directory (sorry, I ignore the file system that particular proprietary UNIX brand uses) seriously impairs performance.

Apparently, ls and related utils show a dismaying performance there. So I wanted to check that for myself on my own Linux backup directory on my ext3 external disk, where I store 100.000 files approx. (Note: I keep time stamped copies of my directory structure with hard links to a single directory where I store the actual files so as to avoid data multiplication).

True, ls and ls -lrt are slow, very slow there. However, ls -lrt > /tmp/ls.txt is as fast as you could expect for a 7.5MB file. Also, touch kk and rm kk are as fast as in an empty directory.

In summary, standard tools such as ls and find may not be ready to deal with huge directories. The sorting abilities of ls may scale as bad as O(n^2) in some circumstances. Also, shell expansion (for *, say) may not be very efficient. However, accessing files for creation, editing or deletion may be blazingly fast even in relatively oldish file systems such as ext3 on the cheapest hardware.