Sunday, 10 April 2011

Too many files in a directory hurting performance in Linux?

Some colleagues of mine have come to a situation where they believe that storing many files (in the order of hundreds of thousands) in a single UNIX directory (sorry, I ignore the file system that particular proprietary UNIX brand uses) seriously impairs performance.

Apparently, ls and related utils show a dismaying performance there. So I wanted to check that for myself on my own Linux backup directory on my ext3 external disk, where I store 100.000 files approx. (Note: I keep time stamped copies of my directory structure with hard links to a single directory where I store the actual files so as to avoid data multiplication).

True, ls and ls -lrt are slow, very slow there. However, ls -lrt > /tmp/ls.txt is as fast as you could expect for a 7.5MB file. Also, touch kk and rm kk are as fast as in an empty directory.

In summary, standard tools such as ls and find may not be ready to deal with huge directories. The sorting abilities of ls may scale as bad as O(n^2) in some circumstances. Also, shell expansion (for *, say) may not be very efficient. However, accessing files for creation, editing or deletion may be blazingly fast even in relatively oldish file systems such as ext3 on the cheapest hardware.

Thursday, 10 February 2011

Inserting Google visualizations in Wordpress blogs

I tried to insert a Google visualization into my (other) blog. I run into problems that Google itself solved.

The solution? Use the Custom fields shortcode plugin for Wordpress: first, install and activate it. Then, wherever your visualization goes, insert

[cf]this_is_my_tag_name[/cf]

If your plugin is properly inserted, at the bottom of the page you will see a form with title "Custom fields". Add a new custom field with name this_is_my_tag_name and paste the code for your visualization into the adjacent field.

The plugin will replace your tag with the code you inserted and your visualization will be properly shown in your entry.

Padding text with leading zeros in Teradata

In order to pad a number with leading zero Teradata you can do as follows:

create set table my_table ,no fallback ,
no before journal,
no after journal,
checksum = default
( a integer )
primary index ( a );

insert into my_table select 25;
insert into my_table select 12345;
insert into my_table select 0;

select substr(
'00000' || cast( a as varchar(5) ),
characters( cast( a as varchar(5) ) ),
5 )
from my_table;

The relevant part lies in the use of the functions substr and characters.

Friday, 21 January 2011

A open(er) web 2.0 chaintool

I am planning to migrate from my standard set of web 2.0 tools to something more open and collaborative. This will mean:

Wednesday, 12 January 2011

Table sizes and disk utilization in Teradata

The analysis of table sizes and disk utilization per AMP in Teradata is based on table dbc.tablesize.

Two typical queries are:

select
vproc,
databasename,
sum(currentperm) / 2**30 as cp,
sum( peakperm ) / 2**30 pp
from dbc.tablesize
group by 1,2
;

to find space utilization by AMP and

select
tablename,
databasename,
sum(currentperm) / 2**30 as cp,
sum( peakperm ) / 2**30 pp
from dbc.tablesize
group by 1,2
;

to find out the sizes of tables.

Sunday, 9 January 2011

More of git for remote server deployment of webpages

This is a brief tutorial on how to deploy your webpage to a remote server using git.

My (minimalistic) vi cheatsheet

:set (no)wrapscan
:set (no)nu
:set (no)ignorecase

\< \> to search for full words

set softtabstop=2
set shiftwidth=2
set tabstop=4
set expandtab