sagasw
Overview
sagasw (Stand Alone Generic Application Service Wrapper) is a utility that wraps any (non-interactive) command line program into a grid job so that it can be executed on an EGEE-compatible infrastructure. It uses the
GASW (Generic Application Service Wrapper) libraries that are also adopted by
MOTEUR to prepare each of the workflow components (=programs) to run as grid jobs. This utility is useful to those who are designing
GASW components and workflows containing these components, but also for those porting any command-line program to the grid.
The development of new grid applications as workflows is composed of the following steps:
- Design the workflow components (=programs)
- Deploy and test the components
- Design the workflow
- Deploy and test the workflow
Step 2 is helped by
sagasw: instead of having to deploy a workflow to see if a component instantiates properly as grid job,
sagasw simply generates the files necessary to run the component on the grid, but doesn't run them. This is done based on:
- a GASW Descriptor (a file in XML describing the inputs and outputs of the application),
- the input values (files and parameters to run the application)
- the output definitions (where to store the results).
sagasw creates the corresponding script (name.sh) and job description (name.jdl) files locally. These can be used to run the application on the grid using the glite command line interface (glite-wms commands, see how to do this at
SARAWiki) See more details about how to prepare
GASW descriptions on the
GASW page.
Usage
Using
sagasw involves a preparatory step where the package is
installed and configured, after which we assume that the package has been installed in a directory that we will call
SAGASWHOME, and that the necessary environment variables
PATH and
LD_LIBRARY_PATH have been set properly. See also
known problems.
To run:
sagasw.sh -i -n -o -d
inputs.txt
This file contains a list of input parameters as they would be indicated the run the program as part of a workflow. Format: one line per value, in the same sequence as they appear in the GASWDescriptor. See example in SAGASWHOME/etc/input.txt. Also, it is important that the last line in the file is ended with a newline. (Enter)
outputs.txt
This file contains a list of output parameters as they would be indicated the run the program as part of a workflow. Format: one line per value, in the same sequence as they appear in the GASWDescriptor. See example in SAGASWHOME/etc/inputs.txt. Also, it is important that the last line in the file is ended with a newline. (Enter)
basename
This indicates the basename of the files generated by sagasw ( basename.sh, basename.jdl). These files are generated in the current directory. The full path of the script ( basename.sh) is indicated as executable in basename.jdl file. See examples in SAGASWHOME/etc/my_experiment.jdl and SAGASWHOME/etc/my_experiment.sh.
gasw_descriptor.xml
A description of the input and output parameters that that is normally passed along with the scufl. This describes the form and shape of the component. See example in SAGASWHOME/etc/GASWdescriptor.xml.
Example:
The files for this example are found on SAGASWHOME/etc.
sagasw.sh -i inputs.txt -n outputs.txt -o my_experiment -d GASWDescriptor.xml
This will generate two files (
my_experiment.jdl and
my_experiment.sh) in the current directory. Note that
inputs.txt and
outputs.txt in this case are also located in the current directory. The job can be directly submitted with
glite-wms-job-submit -d <delegationId> my_experiment.jdl
Note that, because the jdl file refers to the script file, my_experiment.sh should not be seperated from the .sh file. (Or the jdl needs to be adapted manually to point to the path, as well as the file)
Installation
For convenience, the package should be installed on a linux system with gLite. Currently only CentOS 4 and Ubuntu 8.10 are supported. (
Download the correct version).
NOTE: gLite commands are necessary to run the
jobs (jdls, sh)that are generated by the utility.
You can choose to install it in standard locations (requires administrator rights) or to install it in some separate location and
configure the PATH environment variables.However, the
Alegre server contains the newest version of the sagasw package.
Download
The following packages are publicly available:
For simplicity, we will always publish here only the latest version. The complete project admin page is found in [put link to the cvs/bugtracker page later].
Feedback is always appreciated.
Release Notes
| Date |
Version: |
Description |
| 2009-09-15 |
0.4 |
Made the sagasw project compatible with the MOTEUR libraries provided. MOTEUR in its turn has arranged functions and interfaces to ensure 1 to 1 compatibility in later stages. |
| 2009-06-24 |
0.3 |
New libraries included in the distribution. sa-Gasw file generation is nowcompatible to the current MOTEUR installation. |
| 2009-05-14 |
0.2 |
This distribution fixes several important stability issues. Many pre-file-generation checks added. |
| 2009-04-03 |
0.1 |
Fixed some bugs in the script file, that was causing the file generation run to fail |
Configuration
It is necessary to add the directories where the executable files and libraries are located to the proper environment variables. This can be done for all users (by the system administrator) or for each user individually (in the
.bashrc).
In the explanation below we assume that
sagasw has been installed in the directory
SAGASWHOME, and that the directory structure follows the distribution (executables in
bin, libraries in
lib).
SAGASWHOME= export PATH=$PATH:$SAGASWHOME/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$SAGASWHOME/lib export GRIDCONF=$SAGASWHOME/conf/[vlemed.conf or lsgrid.conf, depending on the VO]
The
GRIDCONF environment variable is a reference to a configuration file much like the one written during the run of a workflow. In this file, there are several settings concerning the variables used in the run of the .sh script that is executed with the job. The standard files that I hav written for vlemed and lsgrid are the following:
| vlemed: |
/usr/local/sagasw/conf/vlemed.conf |
| lsgrid: |
/usr/local/sagasw/conf/lsgrid.conf |
If, for some reason you need to configure your own settings, take the following steps:
- In your home directory, create a subdirectory called 'sagasw-config'.
- copy one of the files to this subdirectory in your homedirectory. Name it 'myconfig.conf
- execute:'export GRIDCONF=sagasw-config/myconfig.conf'
On Alegre, the installation is in a custom path. Enter for the SAGASWHOME location: '/usr/local/sagasw'. The export for PATH and LD_LIBRARY_PATH will then point to the appropriate locations for the binary and libraries respectively.
Development
This information is located in the
development page (restricted access)
Known Problems
These are the issues currently known about the application that should be fixed in future updates.
GASWDescriptor must start with <description>
Due to defined parsing restrictions, sagasw does NOT ignore comment lines and gives an error if the file does not start with the token <descriptor>.
Output template in GASWDescriptor is not parsed
Currently the parser doesn't pick up the template definitions in the GASWDescriptor. This is why the user must always inform the names of the output parameters in the file
outputs.txt. Although slightly annoying, this could actually be considered a feature, because this approach provides absolute control over the way the job outputs results.
sagasw does not parse a descriptor that has no output definition.
When no output is defined, sagasw fails to generate the data.