sagasw

Overview

sagasw (Stand Alone Generic Application Service Wrapper) is a utility that wraps any (non-interactive) command line program into a grid job so that it can be executed on an EGEE-compatible infrastructure. It uses the GASW (Generic Application Service Wrapper) libraries that are also adopted by MOTEUR to prepare each of the workflow components (=programs) to run as grid jobs. This utility is useful to those who are designing GASW components and workflows containing these components, but also for those porting any command-line program to the grid.

The development of new grid applications as workflows is composed of the following steps:

  1. Design the workflow components (=programs)
  2. Deploy and test the components
  3. Design the workflow
  4. Deploy and test the workflow
Step 2 is helped by sagasw: instead of having to deploy a workflow to see if a component instantiates properly as grid job, sagasw simply generates the files necessary to run the component on the grid, but doesn't run them. This is done based on:
  • a GASW Descriptor (a file in XML describing the inputs and outputs of the application),
  • the input values (files and parameters to run the application)
  • the output definitions (where to store the results).
sagasw creates the corresponding script (name.sh) and job description (name.jdl) files locally. These can be used to run the application on the grid using the glite command line interface (glite-wms commands, see how to do this at SARAWiki) See more details about how to prepare GASW descriptions on the GASW page.

Usage

Using sagasw involves a preparatory step where the package is installed and configured, after which we assume that the package has been installed in a directory that we will call SAGASWHOME, and that the necessary environment variables PATH and LD_LIBRARY_PATH have been set properly. See also known problems.

To run:

sagasw.sh -i -n -o -d

inputs.txt
This file contains a list of input parameters as they would be indicated the run the program as part of a workflow. Format: one line per value, in the same sequence as they appear in the GASWDescriptor. See example in SAGASWHOME/etc/input.txt. Also, it is important that the last line in the file is ended with a newline. (Enter)

outputs.txt
This file contains a list of output parameters as they would be indicated the run the program as part of a workflow. Format: one line per value, in the same sequence as they appear in the GASWDescriptor. See example in SAGASWHOME/etc/inputs.txt. Also, it is important that the last line in the file is ended with a newline. (Enter)

basename
This indicates the basename of the files generated by sagasw ( basename.sh, basename.jdl). These files are generated in the current directory. The full path of the script ( basename.sh) is indicated as executable in basename.jdl file. See examples in SAGASWHOME/etc/my_experiment.jdl and SAGASWHOME/etc/my_experiment.sh.

gasw_descriptor.xml
A description of the input and output parameters that that is normally passed along with the scufl. This describes the form and shape of the component. See example in SAGASWHOME/etc/GASWdescriptor.xml.

Example:

The files for this example are found on SAGASWHOME/etc.

sagasw.sh -i inputs.txt -n outputs.txt -o my_experiment -d GASWDescriptor.xml

This will generate two files ( my_experiment.jdl and my_experiment.sh) in the current directory. Note that inputs.txt and outputs.txt in this case are also located in the current directory. The job can be directly submitted with

glite-wms-job-submit -d <delegationId> my_experiment.jdl

Note that, because the jdl file refers to the script file, my_experiment.sh should not be seperated from the .sh file. (Or the jdl needs to be adapted manually to point to the path, as well as the file)

Installation

For convenience, the package should be installed on a linux system with gLite. Currently only CentOS 4 and Ubuntu 8.10 are supported. (Download the correct version).

NOTE: gLite commands are necessary to run the jobs (jdls, sh)that are generated by the utility.

You can choose to install it in standard locations (requires administrator rights) or to install it in some separate location and configure the PATH environment variables.However, the Alegre server contains the newest version of the sagasw package.

Download

The following packages are publicly available:

File: Date: Description:
sagasw-CentOS_V0.4.tar 2009-10-20 Version 0.4 - For CentOS only! Contains new libraries derived from current MOTEUR installation on alegre.

For simplicity, we will always publish here only the latest version. The complete project admin page is found in [put link to the cvs/bugtracker page later].

Feedback is always appreciated.

Release Notes

Date Version: Description
2009-09-15 0.4 Made the sagasw project compatible with the MOTEUR libraries provided. MOTEUR in its turn has arranged functions and interfaces to ensure 1 to 1 compatibility in later stages.
2009-06-24 0.3 New libraries included in the distribution. sa-Gasw file generation is nowcompatible to the current MOTEUR installation.
2009-05-14 0.2 This distribution fixes several important stability issues. Many pre-file-generation checks added.
2009-04-03 0.1 Fixed some bugs in the script file, that was causing the file generation run to fail

Configuration

It is necessary to add the directories where the executable files and libraries are located to the proper environment variables. This can be done for all users (by the system administrator) or for each user individually (in the .bashrc).

In the explanation below we assume that sagasw has been installed in the directory SAGASWHOME, and that the directory structure follows the distribution (executables in bin, libraries in lib).

SAGASWHOME= export PATH=$PATH:$SAGASWHOME/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$SAGASWHOME/lib export GRIDCONF=$SAGASWHOME/conf/[vlemed.conf or lsgrid.conf, depending on the VO]

The GRIDCONF environment variable is a reference to a configuration file much like the one written during the run of a workflow. In this file, there are several settings concerning the variables used in the run of the .sh script that is executed with the job. The standard files that I hav written for vlemed and lsgrid are the following:

vlemed: /usr/local/sagasw/conf/vlemed.conf
lsgrid: /usr/local/sagasw/conf/lsgrid.conf

If, for some reason you need to configure your own settings, take the following steps:

  1. In your home directory, create a subdirectory called 'sagasw-config'.
  2. copy one of the files to this subdirectory in your homedirectory. Name it 'myconfig.conf
  3. execute:'export GRIDCONF=sagasw-config/myconfig.conf'
On Alegre, the installation is in a custom path. Enter for the SAGASWHOME location: '/usr/local/sagasw'. The export for PATH and LD_LIBRARY_PATH will then point to the appropriate locations for the binary and libraries respectively.

Development

This information is located in the development page (restricted access)

Known Problems

These are the issues currently known about the application that should be fixed in future updates.

GASWDescriptor must start with <description>

Due to defined parsing restrictions, sagasw does NOT ignore comment lines and gives an error if the file does not start with the token <descriptor>.

Output template in GASWDescriptor is not parsed

Currently the parser doesn't pick up the template definitions in the GASWDescriptor. This is why the user must always inform the names of the output parameters in the file outputs.txt. Although slightly annoying, this could actually be considered a feature, because this approach provides absolute control over the way the job outputs results.

sagasw does not parse a descriptor that has no output definition.

When no output is defined, sagasw fails to generate the data.

Topic revision: r40 - 2012-03-20 - EvertMouw
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback