Greg Pauloski

Computer Scientist // Software Engineer

email | github | linkedin | scholar

Hello there! I am a fourth-year Ph.D. student in Computer Science at the University of Chicago interested in high-performance computing and deep learning frameworks. I am a member of Globus Labs where I am co-advised by Ian Foster and Kyle Chard. I completed my Bachelors in Computer Science at the University of Texas at Austin and previously worked at Apple, Google, and the Texas Advanced Computing Center.

science RESEARCH link

chevron_right
Scalable Deep Learning: We are exploring new techniques for improving deep learning training time and scalability by (1) exploiting scalable algorithms for second-order information approximation; (2) developing methods for adapting to different computer hardware by tuning computation and communication to maximize training speed; and (3) exploring compression techniques to reduce communication overheads.
chevron_right
Workflow Systems: Modern computational science experiments are increasingly written as a coupled set of many distinct software coordinated by a central workflow system. We are designing new programming models which decouple communication from application design to enable multiple data movement methods depending on where data are moved, what are moved, or when they are moved.
chevron_right
Scientific Language Models: We are building large (billion+ parameter) transformer-based language models on broad scientific literature to automate knowledge extraction. We are evaluating the training methods for these models to quantify the impact of training corpus size, model size, and pretraining time on downstream performance, and we are investigating better methods for assessing the quality of the trained models.

engineering PROJECTS link

Check out all of my projects on GitHub.

chevron_right
ProxyStore: Pass-by-reference semantics for distributed Python applications [Code]
chevron_right
K-FAC: Distributed PyTorch K-FAC gradient preconditioner [Code]
chevron_right
LLM Training: Tools and scripts for large language model training [Code]
chevron_right
Colmena: A framework for steer large campaigns of simulations on HPC [Code]
chevron_right
3pseatBot: A hobby Discord bot [Code]

star SELECTED PUBLICATIONS link

Ordered by most recent.

chevron_right
Accelerating Communications in Federated Applications with Transparent Object Proxies [Nov 2023] link
J. Gregory Pauloski, Valerie Hayot-Sasson, Logan Ward, Nathaniel Hudson, Charlie Sabino, Matt Baughman, Kyle Chard, Ian Foster
SC 2023
TLDR | PDF | Website | Code | Poster | Slides | Publication | BibTex
chevron_right
Deep Neural Network Training With Distributed K-FAC [Mar 2022] link
J. Gregory Pauloski, Lei Huang, Weijia Xu, Kyle Chard, Ian Foster, Zhao Zhang
TPDS 2022
TLDR | PDF | Code | Publication | BibTex
chevron_right
KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural Networks [Nov 2021] link
J. Gregory Pauloski, Qi Huang, Lei Huang, Shivaram Venkataraman, Kyle Chard, Ian Foster, Zhao Zhang
SC 2021
TLDR | PDF | Code | Slides | Publication | BibTex
chevron_right
Convolutional Neural Network Training with Distributed K-FAC [Nov 2020] link
J. Gregory Pauloski, Zhao Zhang, Lei Huang, Weijia Xu, Ian Foster
SC 2020
TLDR | PDF | Code | Slides | Publication | BibTex

article PUBLICATIONS link

Ordered by most recent. Bibtex file available for download here.

chevron_right
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies [Nov 2023] link
Collaboration between Microsoft, Rutgers University, University of Sydney, Columbia University, Harvard University, Argonne National Laboratory, University of Chicago, Oak Ridge National Laboratory, Brookhaven National Laboratory, Princeton University, AMD, and NVIDIA
arXiv Preprint
TLDR | PDF | Website | Preprint | BibTex
chevron_right
The Diminishing Returns of Masked Language Models to Science [May 2023] link
Zhi Hong, Aswathy Ajith, J. Gregory Pauloski, Eamon Duede, Kyle Chard, Ian Foster
Findings of the Association for Computational Linguistics: ACL 2023
TLDR | PDF | Website | Preprint | BibTex
chevron_right
Cloud Services Enable Efficient AI-Guided Simulation Workflows across Heterogeneous Resources [Mar 2023] link
Logan Ward, J. Gregory Pauloski, Valerie Hayot-Sasson, Ryan Chard, Yadu Babuji, Ganesh Sivaraman, Sutanay Choudhury, Kyle Chard, Rajeev Thakur, Ian Foster
HCW @ IPDPS 2023
TLDR | PDF | Code | Preprint | BibTex
chevron_right
GenSLMs: Genome-scale Language Models Reveal SARS-CoV-2 Evolutionary Dynamics [Oct 2022] link
Maxim Zvyagin, Alexander Brace, Kyle Hippe, Yuntian Deng, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M Mann, Michael Irvin, J Gregory Pauloski, Logan Ward, Valerie Hayot-Sasson, Murali Emani, Sam Foreman, Zhen Xie, Diangen Lin, Maulik Shukla, Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Ian Foster, James J Davis, Michael E Papka, Thomas Brettin, Rick Stevens, Anima Anandkumar, Venkatram Vishwanath, Arvind Ramanathan
IJHPCA — ACM Gordon Bell Special Prize for COVID-19 Research
TLDR | PDF | Publication | BibTex
chevron_right
Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing [Nov 2021] link
Logan Ward, Ganesh Sivaraman, J. Gregory Pauloski, Yadu Babuji, Ryan Chard, Naveen Dandu, Paul C. Redfern, Rajeev S. Assary, Kyle Chard, Larry A. Curtiss, Rajeev Thakur, Ian Foster
MLHPC @ SC 2021
TLDR | PDF | Website | Code | Publication | BibTex
chevron_right
Models and Processes to Extract Drug-like Molecules From Natural Language Text [Aug 2021] link
Zhi Hong, J. Gregory Pauloski, Logan Ward, Kyle Chard, Ben Blaiszik, Ian Foster
Frontiers in Molecular Biosciences
TLDR | PDF | Publication | BibTex
chevron_right
Efficient I/O for Neural Network Training with Compressed Data [May 2020] link
Zhao Zhang, Lei Huang, J. Gregory Pauloski, Ian Foster
IPDPS 2020
TLDR | PDF | Code | Publication | BibTex
chevron_right
Aggregating Local Storage for Scalable Deep Learning I/O [Dec 2019] link
Zhao Zhang, Lei Huang, J. Gregory Pauloski, Ian Foster
DLS 2019
TLDR | PDF | Code | Publication | BibTex
chevron_right
Glioma Segmentation and a Simple Accurate Model for Overall Survival Prediction [Nov 2018] link
Evan Gates, J. Gregory Pauloski, Dawid Schellingerhout, David Fuentes
BrainLes 2018
TLDR | PDF | Publication | BibTex

co_present PRESENTATIONS link

Ordered by most recent.

chevron_right
Accelerating Communications in Federated Applications with Transparent Object Proxies [Nov 2023]
SC 2023
Slides
chevron_right
ProxyStore: Decoupling Control and Data Flow in Workflows [Oct 2023]
ParslFest 2023
Slides | Video
chevron_right
Accelerating Communications in Federated Applications with Transparent Object Proxies [Apr 2023]
Greater Chicago Area Systems Research Workshop (GCASR) 2023
Poster
chevron_right
ProxyStore: a Data Fabric for Parsl and FuncX [Sep 2022]
ParslFest 2022
Slides | Video
chevron_right
Scalable Deep Neural Network Training with Distributed K-FAC [Mar 2022]
Masters Presentation @ UChicago
Slides
chevron_right
KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural Networks [Nov 2021]
SC 2021
Slides
chevron_right
Convolutional Neural Network Training with Distributed K-FAC [Nov 2020]
SC 2020
Slides
chevron_right
Optimizing Deep Learning Methods for Image Segmentation with Distributed Training [Sep 2018]
TACCSTER 2018
Poster