Click here to go to the TACC Home Page

TACC Projects

The Texas Advanced Computing Center maintains a collection of program libraries and software packages to support high performance computing activities across diverse disciplines. Software products for the supercomputing environment are selected on the basis of quality, history of performance, system compatibility, and benefit to the scientific community. If you need a particular software package for your work, let us know via the Consulting Form.

To see a list of libraries software available on each of TACC's systems, click here.


GotoBLAS

During the last decade, a number of projects have pursued the high-performance implementation of matrix multiplication. Typically, these projects organize the computation around an "inner-kernel", C = trans(A) B + C, that keeps one of the operands in the L1 cache, while streaming parts of the other operands through that cache. Variants include approaches that extend this principle to multiple levels of cache or that apply the same principle to the L2 cache while essentially ignoring the L1 cache. The purpose of this tuning is to optimally amortize the cost of moving data between memory layers.

Our approach is fundamentally different. It starts by observing that for current generation architectures, much of the overhead comes from Translation Look-aside Buffer (TLB) table misses. While the importance of caches is also taken into consideration, it is the minimization of such TLB misses that drives the approach. The result is a novel approach that achieves highly competitive performance on broad spectrum of current high-performance architectures. In addition, we support a large number of BLAS routines as part of the library.

As of April 2006, GotoBLAS is being distributed to the academic community as source code. Commercial parties interested in licensing GotoBLAS may contact Jitendra Jain at the Office of Technology Commercialization, UT Austin, software@otc.utexas.edu.

Intercol - An Optimal MPI Collective Communications Library

For this collaborative project, a combination of techniques, which have been researched since the late 1980's for high-performance implementation of collective communication operations, along with careful exploitation of different MPI communication modes are used to develop an optimal MPI collective communication library. Initial performance compares very favorably with the open-source mpich implementations.

libflame

The objective of the FLAME project is to transform the development of dense linear algebra libraries from an art reserved for experts to a science that can be understood by novice and expert alike. Rather than being only a library, the project encompasses a new notation for expressing algorithms, a methodology for systematic derivation of algorithms, Application Program Interfaces (APIs) for representing the algorithms in code, and tools for mechanical derivation, implementation and analysis of algorithms and implementations.

The libflame Libraries export a set of interfaces that provide users with an object-oriented environment in which to easily manipulate linear algebra structures and perform basic operations such as symmetric rank-k update and triangular solve with multiple right-hand sides as well as higher-level operations such as the Cholesky, LU, and QR factorizations. The source code for building these libraries is now available for testing and review by our peers and colleagues in the scientific computing community. We strongly encourage those who have an interest in developing linear algebra algorithms to download our source code and try interfacing your existing programs using the FLAME APIs.

MyCluster

More information can be found here.

The MyCluster system allows users to create private Condor pools to manage large serial job ensembles on the HPC clusters at TACC. For the advanced user, the system provides the entire Condor interface, enabling all the tools available in the Condor job management system to be used if desired. For the novice user, the system provides convenient wrappers to allow many of these Condor interface management task to be mimicked by equivalent LSF like syntax. Users of the system benefit from not needing to repackage their serial jobs into parallel jobs for optimal usage of the HPC cluster. This is because the system internally submits and manages optimally-sized parallel job proxies through the local scheduler on the local HPC cluster to acquire CPU resources on the user's behalf.

  • More information about the project can be found here.

XUFS

The eXtended User-space File System (XUFS) is an entirely user-space distributed filesystem that allows a user's HOME directory on a personal Linux/Cygwin workstation to “follow” them when remotely connecting to systems vi SSH.

XUFS is particularly suitable for use on high bandwidth wide-area networks because it makes use of aggressive whole-file caching on the remote system, multiple TCP connections for striped data transfers of large files, concurrent pre-caching of small files and an optimistic cache-coherency protocol for synchronizing multiple cached copies of files. Micro and macro benchmarks have shown comparable, and in some cases superior, performance when compared with wide-area network filesystems like IBM GPFS-WAN, OpenAFS and NFSv4.



Tools


Big/Little Endian Fortran File Conversion

When moving from one platform to another, researchers may find the need to convert existing unformatted Fortran files between big and little endian formats. This conversion is necessary when using binary files from a platform of one endianness on a new platform with an alternate endianness (for example, from IBM AIX to Intel IA32). To help aid in this conversion, TACC has developed a C code (fendian_conv.c) which converts unformatted Fortran files from little to big endian format. This conversion utility supports both integers or reals (or even a mixture of the two), however, all of the data within the file must be represented in either 4-byte words or 8-byte words (no mixture). Note that the fendian_conv.c utility is symmetric in the conversion process in the sense that no matter what the endianness of the machine the created the file is, one can always copy it to a machine of opposite endianness to convert the file.

More information

Portable Timers

To quantify application behavior, researchers often need to measure the amount of CPU and system time spent in various routines (or desire to summarize an entire application's execution time). To help facilitate these timings, TACC has developed a set of timing wrappers (timers.c) that are callable from both Fortran and C. These wrapper functions provide a convenient and portable method for measuring wall, user, and system times for all or part of an application.

More information