Publications

Portability of Fortrans do concurrent on GPUs

Published in SC24, 2024

There is a continuing interest in using standard language constructs for accelerated computing in order to avoid (sometimes vendor-specific) external APIs. For Fortran codes, the do concurrent (DC) loop has been successfully demonstrated on the NVIDIA platform. However, support for DC on other platforms has taken longer to implement. Recently, Intel has added DC GPU offload support to its compiler, as has HPE for AMD GPUs. In this paper, we explore the current portability of using DC across GPU vendors using the in-production solar surface flux evolution code, HipFT. We discuss implementation and compilation details, including when/where using directive APIs for data movement is needed/desired compared to using a unified memory system. The performance achieved on both data center and consumer platforms is shown.

Recommended citation: Caplan, Ronald M., Miko M. Stulajter, Jon A. Linker, Jeff Larkin, Henry A. Gabb, Shiquan Su, Ivan Rodriguez, Zachary Tschirhart, and Nicholas Malaya. "Portability of Fortran’s ‘do concurrent’on GPUs." In SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1904-1913. IEEE, 2024. 10.1109/SCW63240.2024.00240

Considerations in the Deployment of Machine Learning Algorithms on Spaceflight Hardware

Published in AeroConf 2021 - Emerging Technologies for Space Applications, 2021

Recent advances in artificial intelligence (AI) and machine learning (ML) have revolutionized many fields. ML has many potential applications in the space domain. Next generation space instruments are producing data at rates that exceed the capabilities of current spacecraft to store or transmit to ground stations. Deployment of ML algorithms onboard future spacecraft could perform processing of sensor data as it is gathered, reducing data volume and providing a dramatic increase in throughput of meaningful data. ML techniques may also be used to enhance the autonomy of space missions. However ML techniques have not yet been widely deployed in space environments, primarily due to limitations on the computational capabilities of spaceflight hardware. The need to verify that high-performance computational hardware can reliably operate in this environment delays the adoption of these technologies. Nevertheless, the availability of advanced processing capabilities onboard spacecraft is increasing. These platforms may not provide the processing power of terrestrial equivalents, but they do provide the resources necessary for deploying real-time execution of ML algorithms. In this paper, we present results exploring the implementation of ML techniques on computationally-constrained, high- reliability spacecraft hardware. We show two ML algorithms utilizing deep learning techniques which illustrate the utility of these approaches for space applications. We describe implementation considerations when tailoring these algorithms for execution on computationally-constrained hardware and present a workflow for performing these optimizations. We also present initial results on characterizing the trade space between algorithm accuracy, throughput, and reliability on a variety of hardware platforms with current and anticipated paths to spaceflight.

Recommended citation: R. McBee, J. L. Anderson, M. A. Koets, J. Ramirez and Z. Tschirhart, "Considerations in the Deployment of Machine Learning Algorithms on Spaceflight Hardware," 2021 IEEE Aerospace Conference (50100), 2021, pp. 1-10, doi: 10.1109/AERO50100.2021.9438171. https://doi.org/10.1109/AERO50100.2021.9438171

Evaluation of Clustering Techniques for GPS Phenotyping Using Mobile Sensor Data

Published in PEARC 2020 - Trending now – machine learning and artificial intelligence, 2020

With the ubiquitousness of mobile smart phones, health researchers are increasingly interested in leveraging these commonplace devices as data collection instruments for near real-time data to aid in remote monitoring, and to support analysis and detection of patterns associated with a variety of health-related outcomes. As such, this work focuses on the analysis of GPS data collected through an open-source mobile platform over two months in support of a larger study being undertaken to develop a digital phenotype for pregnancy using smart phone data. An exploration of a variety of off-the-shelf clustering methods was completed to assess accuracy and runtime performance for a modest time-series of 292K non-uniform samples on the Stampede2 system at TACC. Motivated by phenotyping needs to not-only assess the physical coordinates of GPS clusters, but also the accumulated time spent at high-interest locations, two additional approaches were implemented to facilitate cluster time accumulation using a pre-processing step that was also crucial in improving clustering accuracy and scalability. Received Best Student Paper Award.

Recommended citation: Tschirhart, Zachary S., and Karl W. Schulz. "Evaluation of Clustering Techniques for GPS Phenotyping Using Mobile Sensor Data." Practice and Experience in Advanced Research Computing. 2020. 364-371. https://doi.org/10.1145/3311790.3396665

Exploring Parallel Programming Models for Heterogeneous Computing Systems

Published in 2015 IEEE International Symposium on Workload Characterization, 2015

Parallel systems that employ CPUs and GPUs as two heterogeneous computational units have become immensely popular due to their ability to maximize performance under restrictive thermal budgets. However, programming heterogeneous systems via traditional programming models like OpenCL or CUDA involves rewriting large portions of application-code. They also lead to code that is not performance portable across different architectures or even across different generations of the same architecture. In this paper, we evaluate the current state of two emerging parallel programming models: C++ AMP and OpenACC. These emerging programming paradigms require minimal code changes and rely on compilers to interact with the low-level hardware language, thereby producing performance portable code from an application standpoint. We analyze the performance and productivity of the emerging programming models and compare them with OpenCL using a diverse set of applications on two different architectures, a CPU coupled with a discrete GPU and an Accelerated Programming Unit (APU). Our experiments demonstrate that while the emerging programming models improve programmer productivity, they do not yet expose enough flexibility to extract maximum performance as compared to traditional programming models.

Recommended citation: Daga, Mayank, Zachary S. Tschirhart, and Chip Freitag. "Exploring parallel programming models for heterogeneous computing systems." 2015 IEEE International Symposium on Workload Characterization. IEEE, 2015. https://ieeexplore.ieee.org/abstract/document/7314151

PHOPHECY (PRecise OPerations for High Efficiency Communication sYstems) Proposal

Published in The University of Texas at Austin, 2014

This document contains a proposal of a near-earth CubeSat mission to serve as a demonstration of high efficiency Laser Optical Communication using high precision Attitude, Determination, and Control (ADC) systems in order to meet the specific pointing accuracies required to operate in a free-space environment. The PRecision OPerations for High Efficiency Communication sYstems (PROPHECY) mission was created to demonstrate that high ADC accuracy and precision in swiftly developed, low-cost satellites can adequately meet the needs of future communications networks without significant detriment to mission budgets, be that time or monetary constraints. State-of-the-art Free-Space Optical Laser Communication (LaserCom) technologies will be used to analyze and quantify the specifications of the ADC system. LaserCom requires pointing accuracies on order of tens of arcseconds - equivalent to a few thousandths of a degree. Due to the multiple pitfalls of current Radio Frequency (RF) technologies (such as limited bandwidth, interference, large power losses, and bottlenecking of data) there is a need for a more efficient and powerful method of communication. LaserCom fulfills those needs - and then some. With no bandwidth limitations, virtually no interference in free-space applications, and 10 -100 times the data efficiency of current RF technologies - LaserCom has proven to be the ideal candidate. NASA intends to employ the Tracking and Data Relay Satellites (TDRS) with LaserCom upgrades for near-earth and atmospheric communications. Based on the NASA statement “The next generation in communications satellites will supply both [Radio Frequency (RF)] and optical services.” [Laser Comm Relay], the PROPHECY mission has been suggested in order to prove that CubeSats can become a viable alternative to larger satellites and provide the same level of communication support in near-Earth missions. Using the discoveries of the Laser Communication Relay Demonstration (LCRD), a mission designed to demonstrate the effectiveness of laser communication from the moon, PROPHECY aims to demonstrate and prove that CubeSats are capable of providing a standard LEO/GEO platform for future near-earth laser communication needs with quick development at low-costs. Lastly, as satellites become more prevalent in the use of everyday technology, the nature of the CubeSat as a rapidly-developed and easily replaceable satellite becomes more enticing. This mission stands to prove that highly efficient communication and data relay satellites can be produced and deployed on a short timeline, while maintaining the efficiency and efficacy of a the previously larger satellites. In summary, PROPHECY aims to prove the ability of CubeSats to act as reliable short-term satellites for Free-Space, high-atmosphere, and ground communications at a fraction of the production time and cost of current state-of-the-art communications satellites.

Download here

Tower Shines for Supercomputing Win

Published in SC13, 2013

The university will celebrate the student team who won the 8th annual Student Cluster Competition at this year’s Supercomputing (SC13) conference. The Tower will be lit burnt orange tonight, Dec. 2. The UT students competed against seven other teams from the U.S., Germany, China and Australia during the real-time, non-stop, 48-hour challenge. The competition is designed to introduce the next generation of students to the high-performance computing community.

Download here

Iterative Targeting Algorithm in the Monte Carlo Framework

Published in UT/JSC Trick Modeling Initiative 2010, 2010

The present study focuses on the implementation of an iterative targeting algorithm within a Trick simulation environment. A team of undergraduate research students at The University of Texas at Austin accomplished the investigation and software implementation. The simulation developed seeks to identify the impulsive maneuver required by a spacecraft to transfer between two arbitrary points in Earth orbit. The initial state and the terminal position vector are inputs to the Trick simulation. Then, Trick’s built-in Monte Carlo master/slave framework is leveraged to converge on the impulsive maneuver required for the spacecraft to initiate the transfer. The spacecraft trajectory modeling is accomplished through Trick with simple, two-body, point mass equations of motion. The targeting algorithm relies on the availability of the state transition matrix (STM). To facilitate a generalized force model and future JSC Engineering Orbital Dynamics (JEOD) implementation, the STM is constructed with a finite differencing algorithm. Thus, the Trick simulation developed in this report is divided into three routines: the physical simulation, the finite differencing process, and the targeting process. The physical simulation consists of a set of differential equations integrated with Trick. The finite differencing process calculates the STM. Finally, the targeting process employs the STM to identify the necessary changes in the initial state to achieve the desired end state. This work builds on the results of a previous study of optimization in Trick and the Cannonball tutorial provided with the Trick Documentation. The body of this final report presents the objectives of this project, the mathematical grounding of the algorithm, and discussion of the interaction between the various Trick source and model files. Finally, the results are illustrated, along with some of the major obstacles encountered in an attempt to integrate the targeter with JEOD. Suggestions are made for future study of the interaction between the Monte Carlo framework and JEOD, and the implementation of more advanced targeters in Trick.

Download here

Trick Optimization

Published in UT/JSC/NASA, 2010

This project was designed to solve the issue of implementing an iterative optimization algorithm in Trick. The goal was to first define a concrete, achievable objective around which a plan of action could be developed, and then implement an optimization method in Trick. Once an optimization method had been determined, the next step was to implement it in a simple proof- of-concept Trick simulation so that eventually we have the necessary tools to eventually integrate the targeting algorithm in a Trick simulation using JEOD.

Download here

Zachary S. Tschirhart, P.E.

Publications