Peroma Project: HPC en la nube con StartCluster

martes, 12 de febrero de 2013

HPC en la nube con StartCluster

Hola, por ahi les comentaré un artículo que encontré de "Garvin W.Burris" que habla HPC basada en la nube. StartCluster es un proyecto del Massachusetts Institute of Technology (MIT) elaborado por Software Tools for Academics and Researchers (STAR), el Kit de herramientas es de código abierto y se distribuye libremente. La última versión de desarrollo se encuentra en el sistema de control de revisiones Github. Es prerequisito para el uso de StarCluster disponer de una cuenta de Amazon AWS. El ejemplo que mostrado sólo les deberá costar lo mínimo(céntimos) por el uso de una pequeña cantidad de almacenamiento en disco Elastic Block Storage (EBS), así disponiendo de la cuenta abrimos una terminal en Linux local.

Primero creamos un directorio para el proyecto llamado starcluster:

$mkdir ~/starcluster ; cd ~/starcluster
$curl -0 https://raw.github.com/pypa/virtualenv/master/virtualenv.py
$python virtualenv.py cloudhpc
New python executable in cloudhpc/bin/python
Installing setuptools...........................done.
Installing pip........................done.
$.cloudhpc/bin/activate
$pip install starcluster
Downloading/unpacking starcluster ...
Sucefully installed starcluster ssh boto
workerpool Jinja2 decorator pyasn1 pycrypto
Cleaning up...

Al ejecutarse StarCluster lanzará un error , preguntandonos si queremos crear una configuración nueva. Seleccionamos la opción 2 para crear un fichero de configuración por defecto.

$starcluster help
StarCluster - (http>//web.mit.edu/starcluster)
(v.0.93.3)
Software Tools for Academics and Researchers(STAR)
Please submit bug reports to starcluster@mit.edu

!!! ERROR - config file ~/.starcluster/config does not exist
Options:
---------
[1] Show the StarCluster config template
[2] Write config template to~/.starcluster/config
[q] Quit

Please enter your selection: 2
>>config template written to ~/.starcluster/config
>>Please customize the config template

editamos dicho fichero de configuración estableciendo los valores correspondientes. Asegurandonos de sustituir nuestras creenciales. Añadimos Parámetros a la configuración:

$nano ~/.starcluster/config
[aws info]
AWS_ACCESS_KEY_ID=YOUR_AWS_Access_key_ID
AWS_SECRET_ACCESS_KEY=YOUR_Secret_Access_key
AWS_USER_ID=YOUR_amazon_userid

creamos la clave SSH para el acceso por la Shell remota a los nodos del Clúster , introducimos el siguiente comando:

$starcluster createkey -o ~/starcluster/foocluster.rsa foocluster

Especificamos la imagen y tipo de instancia:

$nano ~/.starcluster/config
[key foocluster]
KEY_LOCATION=~/starcluster/foocluster.rsa
[cluster smallcluster]
KEYNAME=foocluster
NODE_IMAGE_ID=ami -d3ce7bba
NODE_INSTANCE_TYPE=tl.micro

Iniciando el Micro Cluster de dos nodos

$starcluster start -s 2 foocluster
...
>>>Using default cluster template:smallcluster
>>>Validating cluster template settings...
>>>Cluster template settings are valid
>>>Starting cluster...
>>>Launching a 2-node cluster...
...
>>>Configuring cluster took 1.154 mins
>>>Starting cluster took 2.305 mins
...

El nodo principal de un cluster se denomina nodo maestro. Listando los cluster en ejecución:

$starcluster listclusters
...
Uptime:0 days, 00:04:05
Zone:us-east-1d
keypair: foocluster
EBS volumes:
vol - 4c048437 on master:/dev/sda (status:attached)
vol - 4f048434 on node001:/dev/sda (status:attached)
Cluster nodes:
master running i -6a10c710
ec2-50-17-57-111.compute-1.amazonaws.com
node001 running i - 6810c712
ec2-23-20-255-177.compute-1.amazonaws.com
Total nodes: 2

Visualizando la cola:

$starcluster sshmaster foocluster -u sgeadmin
[sgeadmin@master~]$ qstat -g c
...
[sgeadmin@master~]$qhost
...

El estado muestra que tenemos 2 hosts con un hueco de CPU cada uno en la cola de trabajos all.q Pero por qué se utilizan estas CPUs?como ejemplo usaremos el método de Montecarlo para calcular el valor de PI:
$grep -v local /etc/hosts | cut -d" " -f2>~/hostfile
$nano pi.py
$mpirun -np 2 -hostfile hostfile python pi.py
3.14192133333

fichero pi.py
01 from mpi4py import MPI
02 import random
03
04 comm=MPI.COMM_WORLD
05 rank=comm.Get_rank()
06 mpisize=comm.GET_size()
07 nsamples=int(6e6/mpisize)
08
09 inside=0
10 random.seed(rank)
11 for i in range(nsamples):
12 x=random.random()
13 y=random.random()
14 if (x*x)+(y*y)<1:
15 inside += 1
16
17 mypi=(4.0 * inside)/nsamples
18 pi=comm.reduce(mypi , op=MPI.SUM, root=0)
19
20 if rank==0
21 print(1.0 / mpisize)*pi

Crearemos un script de trabajo especificando el entorno de paralelización ,para usar la cola:
01#!/bin/sh
02#Export all environment variables
03#$ -V
04#Your job name
05#$ -N pi
06#Use current working directory
07#$ -cwd
08#Join stdout and stderr
09#$ -j y
10#PARALLEL ENVIRONMENT:
11#$-pe orte 2
12#Enable resource reservation
13#$ -R y
14#The max hard walltime for this job is 16 minutes (after this it will killed)
15#$ -l h_rt=00:16:00
16#The max soft walltime for this job is 15 minute(after this SIGUSR2 will be sent)
17#$ -l s_rt=00:15:00
18echo "Got $NSLOTS processors."
19#The mpirun command.
20mpirun -np $NSLOTS python pi.py

enviamos con el comando :
$nano pi.sh
$qsub pi.sh

comprobamos el estado del trabajo con el comando qstate. Tras finalizar la salida de trabajo podemos encontrarla en el fichero $JOB_NAME ó $JOB_ID
$qstat
$cat pi.o1
...
Got 2 processors.
3.14192133333

Cuando hayamos finalizado, tenemos que asegurarnos de salir y finalizar todas las instancias (porque la factura se basará en las horas de uso) introducimos las siguientes líneas:
$exit
$startcluster terminate foocluster
...
Terminate EBS cluster foocluster (y/n)? y

Para aprender más de StarCluster del MIT podrán encontar información en :

http://web.mit.edu/star/cluster/docs/latest/contents.html
http://github.com/jtriley/StarCluster/wiki