Tuesday 29 December 2009

The best R tricks

This time, the post has been written by others. The community has chosen the best R tricks and made a poll whose results can be browsed at

http://stackoverflow.com/questions/1295955/what-is-the-most-useful-r-trick

Friday 18 December 2009

Running BOINC on a Centos 5.4 server through command line

I wanted to install BOINC on a Centos 5.4 server with no GUI. I could not find BOINC on any Centos RPM repositories and the newest Linux version did not seem to work.

So I went to the BOINC download page and chose the older 6.6.41 version for x86 Linux with CLI only, and downloaded it with wget to my server. 

The downloaded file was a Bash executable script that expanded a whole directory with a number of programs inside, among which, boinc and boinccmd.

To have BOINC running, you need just do

nohup boinc &

However, the process is not attached to any BOINC project. I wanted to attach to Ibercivis, so I used the following command:

boinccmd --project_attach http://registro.ibercivis.es d44xxxxxxxxxxxx7c3

Of course, I have masked my authorization string, d44xxxxxxxxxxxx7c3. I got it from the file account_registro.ibercivis.es.xml in another installation I had on another machine.

In a matter of seconds, the command 

boinccmd --get_state

indicated that BOINC was already running on my machine. The boinccmd command also allows you to configure plenty of other parameters, but that is way another topic.

Connecting R and Postgres via JDBC

I need to make a note of this for reference:

library( RJDBC ) 
postgres <- JDBC( "org.postgresql.Driver", "/path/to/postgresql-8.3-604.jdbc4.jar")
con <- dbConnect(postgres, "jdbc:postgresql://localhost:5432/mydbname", user = "myusername", password = "xxx" )

From here on, I can read

my.r.df <- dbReadTable( con, "myschema.mytablename" )

and save

dbWriteTable( con, "myschema.mytablename", my.r.df )

Thursday 17 December 2009

Left padding of string field in Kettle

I had to left pad a field using Kettle. It was a postal code mistakenly coded as a double in an Access database and it had to be transformed. In my country, postal codes consist of 5 digits, the first of which could be "0".

Mapping the field to "string" was not good enough: blanks were used for padding. In order to use "0", I had to use the JavaScript node in Kettle. The following graphic contains my little code snippet:

For some reason, the code did not "validate" in the node (there is a "validation" button), but the data flow worked as a charm.