Thursday, 30 September 2010

Query to select last observation by date for each customer

Sometimes you have a table with a list of transactions of each, say, customer and you need the latest picture of each customer. The so called window functions in SQL2003 are instrumental in order to write neat, concise SQL for such purpose.

It would work as follows (in Teradata's SQL dialect):

select
  cust_id,
  status,
  status_date,
row_number() over (partition by cust_id order by status_date desc ) as ranking
from customer_status
qualify ranking = 1;

Saturday, 25 September 2010

Relaying Postfix SMTP via smtp.gmail.com

You just need to follow these instructions to relay Postfix mail via smtp.gmail.com. You just need to be careful that some packages mentioned there have updated their names in recent Ubuntu versions: for instance, package libsasl2 mentioned there is now (as of today!) called libsasl2-2. 

Other than that, it works as a charm!

Saturday, 11 September 2010

Run BOINC on a Windows box without Internet connection

If you have access to a Windows box without internet connection on which you wish to run BOINC, you can do as follows:
  1. Install BOINC on another Windows box with internet connection and register to your project(s).
  2. Copy the BOINC folder (possibly at C:\Program Files\BOINC) to your pen-drive. You may want to uninstall BOINC from this machine at this point.
  3. Navigate to the BOINC folder in your pendrive, and run the boinc executable file in order to download new tasks. Try to get as many as you can by setting the BOINC options adequately.
  4. Plug your pen-drive to the machine without internet connection and run the boinc executable file from it. Your tasks will start running. Those finishing will stay there waiting to be transferred back to the project servers.
  5. When the tasks have finished (or when you see fit!) unplug the pen-drive, plug it back to the machine with internet connection and see your tasks be uploaded to the project servers.
Note: beware of the submit time limit for tasks so that they do not expire.

Wednesday, 25 August 2010

R trick: plot data in one scale, label axes on the original scale

Here is the way to do it. First, get the data and plot it in the logarithmic scale, say, without the y axis:

x <- exp( rnorm( 100 ) )
plot( log( x ), yaxt="n" )

Create the labels:

a <- 1:ceiling( max( x ) )

Finally, draw the axis at the original scale and with your desired labels:

axis(2, at=log(a), label=a)

Wednesday, 18 August 2010

How many rows assigns Teradata to each AMP for a given table?

Teradata distributes table rows to different AMP's according to a hash value calculated from its primary key. The following query indicates how many rows fall into each AMP for a given table.

SELECT
HASHAMP (HASHBUCKET(HASHROW( col1, col2, col3 ))) AS AMP_NUM,
COUNT(*) as NUM_ROWS_PER_AMP
FROM
mytable
GROUP BY 1
ORDER BY 1;


It is useful to check whether your table is properly distributed or not!

Tuesday, 17 August 2010

PHP's "Fatal error: Call to undefined function: curl_init()" on Ubuntu

After Twitter changed its access settings for applications to tweet directly into it, I had some problems updating my WP to Twitter plugin. I got a "Fatal error: Call to undefined function: curl_init()". 

Problem? Missing php5-curl package on my Ubuntu server.

Solution?

sudo apt-get install php5-curl
sudo /etc/init.d/apache2 restart

Listing missing statistics from Teradata query optimizer

Teradata is a fine DBMS whose performance critically depends on whether the right table statistics are available to the query optimizer.

You can request it to hint you for the missing statistics that it would like to have on your tables running the code

diagnostic helpstats on for session;

before checking query plans. After running the command above, the query plan, at the end, will list those statistics that the end of the query plan.