The University of Texas at Austin Texas Advanced Computing Center
AspectShell

AspectShell User Guide

1 Overview

1.1 Supported Environments

AspectShell has been tested in the following environments:

AspectShell has not been tested in other environments.

1.2 Architecture

AspectShell is implemented with a software agent plug-in architecture. A submission agent is invoked to execute a command when it is specified with the keyword placement constructs “on”, “in”, or “with”. Also, a data transfer agent is invoked when the redirection operators “>” or “<” is specified with a URL, or with a task identifier task_<id>, task_all, or task_any. The software components architecture of AspectShell is shown in figure 1.

Figure 1 - Software Architectural Components

AspectShell augments the implementation of the “command parser”, “command executer” and “stdin/stdout redirector” components in TCSH, with the “submission agent” and “data agent” components. The command executor will invoke the submission agent executable when the command parser detects the placement clauses “in”, “on” or “where”. This causes the executable to be passed as an argument to a software executable defined in the installation of AspectShell. Also, when a shell command output or input is redirected to a “gsiftp” URL scheme or to a gridshell task identifier, the data agent is invoked. The data agent is then responsible for opening a TCP socket descriptor to the remote location, and passing this descriptor back to the shell environment for data to be redirected to.

2 User Guide

2.1 Submitting a job

A command will be scheduled on a distributed resource when the user specifies the placement construct “on <host name>” anywhere in its command line. For example, a user may wish to list the contents of the /tmp directory on a remote host as shown,

%> ls -l /tmp on host1

By default jobs are run remotely in interactive mode, using a remote execution facility like globus-job-run or ssh. In most high performance clusters however, jobs need to be submitted to a batch queuing system. Submission to the underlying batch queuing system is also supported in AspectShell. In order to enable this, the user needs to tell AspectShell that it is in “batch” mode:

%> setaspect -batch on
aspect mode: batch is on

Subsequent commands to AspectShell with the appropriate submission constructs described later in the document will be submitted to the underlying batch queuing system. Batch mode can also be turned on by defining the _ASPECT_BATCH_MODE environment variable as well. To turn “batch” mode off, the user will needs to type:

%> setaspect -batch off
aspect mode: batch is off

2.2 Specifying resource requirements

A user may also give the underlying scheduler an explicit resource requirement when selecting a host to execute a command. This resource requirement can be specified with the “with” keyword. The string following this keyword is treated as the resource specification for the command, and is propagated to the an underlying resource broker for host selection. For example, a NASTRAN simulation, that needs to be executed on a SPARC host with more the 25M of free memory can be run as shown,

%> nastran with “mem > 25 && type == SPARC”

In AspectShell, the command will then be dispatched on a host that meets the specified resource requirement.

2.3 Parametric sweep submissions

Job group submissions can be declared with the “in <number> instances” placement construct. This causes the number of instances specified, collectively called a group, to be submitted to the underlying distributed resource management system. The example below declares that 20000 instances of the command cmkin.exe should be submitted to the cluster resource managed environment:

%> cmkin.exe in 20000 instances

Production batch systems however often restrict the number of jobs that can be submitted to its queues, preventing a single user from starving out other users of its compute resources. AspectShell will therefore throttle the submission of the executable, by ensuring that the local policy limit of queued jobs per user is not overrun. The limit is specified when the administrator first installs the submission agent on site with the _ASPECT_THROTTLE environment variable.

2.4 Transferring data

The shell also overloads the “<” and “>” I/O operators to allow remote files to be accessed by the shell. To enable this feature, the user needs to enable “io” mode by typeing:

%> setaspect -io on

To turn this feature off, the user equivalently types:

%> setaspect -io off
Our implementation allows remote files to be specified with a GSIFTP URL for redirecting to the standard input and output of a command. For example, a user can specify that the standard output of a command be redirected to a file residing on a local directory on a remote host,

%> echo “Hello World” > gsiftp://compute-9-2/tmp/test.txt

Similarly, a user can specify that the content of a file located on a remote host be redirected to the standard input of a command,

%> cat < gsiftp://compute-9-2/tmp/test.txt

2.5 Writing a parallel script

A command may also be executed in parallel by the shell when the user specifies the placement construct “on <number> procs” as an argument anywhere within the command line. The user may further use the “with” keyword, in conjunction with this, to indicate a resource requirements for selecting these hosts for parallel execution. An example would be if the command needs to be executed in parallel on three compute nodes, each with at least 25M of memory available,

%> /bin/hostname with “mem > 25” on 3 procs

Commands executed in parallel by the shell will also have the environment variables _ASPECT_TASK_NUM and _ASPECT_TASKID set in its environment. The former variable indicates the total number of tasks involved in this parallel execution and the later the tasks unique rank respectively. This is useful in Single Program Multiple Data (SPMD) type parallel execution, or for a master-slave type parallel execution pattern, where a task's rank determines the data, or the role it takes in the parallel execution.

Commands executed in parallel by AspectShell may also be shell scripts themselves. If this is the case, these scripts are able to communicate with each other by using the overloaded I/O redirect operators “>” and “<”. The scripts specify the task with which it wishes to communicate with by specifying the key word task_<task number>.

For example, tasks with a rank greater then “0” may communicate the end of its computation to task rank “0” for synchronization,

if ( $_ASPECT_TASKID > 0 ) then
    echo “I am finished” > task_0
endif

Similarly, task “0” may wait on a blocking receive for all the other tasks in the parallel execution of the script before finally reporting its completion.

@ n = 1
while ( $n < $_ASPECT_TASK_NUM ) 
    ack=`cat < task_$n`
    @ n = $n + 1
Done
echo “Computation complete!”

We also introduce syntax to allow a task to broadcast data to all participating tasks in a parallel execution of a command. A task may do this by specifying the task_all keyword when redirecting output,

echo “$initial_data” > task_all

Also a task may be specified to wait on input data from any other task in the parallel execution of a command, by specifying the task_any keyword when redirecting input,

set response = `cat < task_any`

3 Administration

3.1 Installing AspectShell

[ewalker@lela ~]$ zcat aspect-tcsh-version.tar.gz | tar xvf -
[ewalker@lela ~]$ cd aspect-tcsh-version
[ewalker@lela ~/aspect-tcsh-version]$ ./configure
[ewalker@lela ~/aspect-tcsh-version]$ make; make install

The default installation will install all aspect-tcsh binaries in $HOME/aspectshell/bin and create a symlink from $HOME/.aspectshell to this installation directory. However, you can change the path for the installation binaries by specifying a different path with --prefix when you invoke configure. Note that you will then need to manually create a symlink from $HOME/.aspectshell to your new installation directory.

E.g. if you wish to install in /opt/aspectshell:

[ewalker@lela ~/aspect-tcsh-version]$ ./configure --prefix=/opt/aspectshell
[ewalker@lela ~/aspect-tcsh-version]$ make; make install
[ewalker@lela ~/aspect-tcsh-version]$ ln -s /opt/aspectshell/bin $HOME/.aspectshell

3.2 Configuring your environment

Optional configuration environment variables:

ASPECTSHELL_LOCATION - Identifies an alternative AspectShell installation location. Default == $HOME/.aspectshell.

_ASPECT_SUBMIT_AGENT - Identifies the agent executable that will execute a command invoked with an AspectShell placement construct. The agent is responsible for mapping the command execution to the appropriate underlying execution infrastructure. The environment variable can specify the agent executable command line with the meta-arguments %E, %U, and %R. These meta-arguments will expand to the executable command string, invoking username, and resource requirement string respectively. If no resource is specified by the user, the %R value will be expanded to the string "default". Default == $HOME/.aspectshell/aspect-submit-agent %E

_ASPECT_DATA_AGENT - Identifies the agent executable that will perform the remote data transfer when the overloaded shell “>” and “<” redirection operators are invoked. The location of the remote data source will be passed to the data agent which is responsible for opening a channel to the location with the appropriate read or write mode, and passing the channel file descriptor back to the shell internals for redirection. Default == $HOME/.aspectshell/aspect-data-agent

_ASPECT_THROTTLE - Limits the number of jobs that are submitted to the local batch queuing system. This is used for job ensemble submissions. Default == no throttle.

_ASPECT_DEBUG - Turns on verbose debugging.

_ASPECT_INSTANCE_NUM - Automatically defined by the shell when the “in <num> instances” syntax extension in a job ensemble submission is invoked. This informs the submit agents how many instances of the command needs to be executed. The agent is responsible for performing the ensemble submission. Default == 1.

_ASPECT_TASK_NUM - Automatically defined by the shell when the “on <num> procs” syntax extension in a parallel job execution is invoked. This informs the submit agent how many parallel tasks needs to be spawned. Default == 1.

_ASPECT_RESOURCE - Automatically defined by the shell when the “with ” syntax extension is invoked. This allows the submit agent to use a resource broker if available to identify compute nodes with the required capabilities. Default == “default”.

_ASPECT_BATCH_MODE - Turns on the batch job submission semantics. This causes the entire command line (including the aspect extension) to be passed to the submit agent for addition into a batch script file. This mode is also automatically defined when the uses specifies setaspect -batch on at the command line. Default == none.

_ASPECT_NETWORK_DEVICE - Identifies the network interface to which the aspect-comm-agent will bind-listen too. The aspect-comm-agent implements the shell message-passing feature. Default == hostname.

3.3 Run-time Modes

Run-time modes can be turned on/off with the setaspect builtin command

   ewalker:~/mycluster-v2/aspect-tcsh> builtins  | grep setaspect
   repeat     sched      set        setaspect  setenv     settc      setty
   ewalker:~/mycluster-v2/aspect-tcsh> setaspect -v
   aspect mode: verbose off
   aspect mode: debug is off
   aspect mode: I/O is off
   aspect mode: batch is off
   ewalker:~/mycluster-v2/aspect-tcsh> setaspect -io on

Definition of modes: