Research

Honors

Selected publications

[1]
Boyuan Zhang, Bo Fang, Fanjiang Ye, Luanzheng Guo, Fengguang Song, Nathan Tallent, and Dingwen Tao. BMQSim: Overcoming memory constraints in quantum circuit simulation with a high-fidelity compression framework. In Proc. of the 2025 ACM Intl. Conf. on Supercomputing, June 2025.
[2]
Jesun Firoz, Hyungro Lee, Luanzheng Guo, Meng Tang, Nathan R. Tallent, and Zhen Peng. Fastflow: Rapid workflow response by prioritizing critical data flows and their interactions. In Proc. of the 37th Intl. Conf. on Scalable Scientific Data Management. ACM, June 2025.
[3]
Hyungro Lee, Jesun Firoz, Nathan R. Tallent, Luanzheng Guo, and Mahantesh Halappanavar. FlowForecaster: Automatically inferring detailed & interpretable workflow scaling models for forecasts. In Proc. of the 39th IEEE Intl. Parallel and Distributed Processing Symp. IEEE Computer Society, June 2025.
[4]
Nahid Newaz, Sayan Ghosh, Nathan R. Tallent, and Guangzhi Qu. Locality aware process remapping for distributed-memory graph workloads. In Proc. of the 39th IEEE Intl. Parallel and Distributed Processing Symp. IEEE Computer Society, June 2025.
[5]
Waqwoya Abebe, Jan Strube, Luanzheng Guo, Nathan R. Tallent, Oceane Bel, Steven Spurgeon, Christina Doty, and Ali Jannesari. SAM-I-Am: Semantic boosting for zero-shot atomic-scale electron micrograph segmentation. Computational Materials Science, 246:113400, January 2025. (doi:10.1016/j.commatsci.2024.113400)
[6]
Chris Egersdoerfer, Md. Hasanur Rashid, Dong Dai, Bo Fang, and Nathan R. Tallent. Understanding and predicting cross-application I/O interference in HPC storage systems. In Proc. of the Intl. Conf. for High Performance Computing, Networking, Storage and Analysis (9th Intl. Parallel Data Systems Workshop), November 2024.
[7]
Shiyue Hou, Nathan R. Tallent, Li Wang, and Ningfang Mi. Performance analysis of data processing in distributed file systems with near data processing. In 11th Intl. Symp. on Networks, Computers and Communications. IEEE, October 2024. (doi:10.1109/ISNCC62547.2024.10758994)
[8]
Joshua Suetterlein, Stephen J. Young, Jesun Firoz, Joseph Manzano, Ryan Friese, Nathan R. Tallent, Kevin Barker, and Timothy Stavenger. Hpc network simulation tuning via automatic extraction of hardware parameters. In Proc. of the 2024 IEEE High Performance Extreme Computing Conference, September 2024.
[9]
Yasodha Suriyakumar, Nathan R. Tallent, Andrés Marquez, and Karen Karavanic. MemFriend: Understanding memory performance with spatial-temporal affinity. In Proc. of the International Symposium on Memory Systems (MemSys 2024), New York, NY, USA, September 2024. ACM. (doi:10.1145/3695794.3695820)
[10]
Aishwarya Sarkar, Sayan Ghosh, Nathan R. Tallent, and Ali Jannesari. MassiveGNN: Efficient training via prefetching for massively connected distributed graphs. In Proc. of the 2024 IEEE Conf. on Cluster Computing, pages 62–73. IEEE, September 2024. (doi:10.1109/CLUSTER59578.2024.00013)
[11]
Meng Tang, Jaime Cernuda, Jie Ye, Luanzheng Guo, Nathan R. Tallent, Anthony Kougkas, and Xian-He Sun. DaYu: Optimizing distributed scientific workflows by decoding dataflow semantics and dynamics. In Proc. of the 2024 IEEE Conf. on Cluster Computing, pages 357–369. IEEE, September 2024. (doi:10.1109/CLUSTER59578.2024.00038)
[12]
Nahid Newaz, Sayan Ghosh, Nathan R. Tallent, Joshua Suetterlein, Atiqul Mollah, and Hua Ming. Graph analytics on Jellyfish topology. In Proc. of the 38th IEEE Intl. Parallel and Distributed Processing Symp., May 2024. (doi:10.1109/IPDPS57955.2024.00079)
[13]
Jou-An Chen, Hsin-Hsuan Sung, Xipeng Shen, Nathan Tallent, Kevin Barker, and Ang Li. Accelerating matrix-centric graph processing on gpus through bit-level optimizations. Journal of Parallel and Distributed Computing, 177:53–67, 2023. (doi:https://doi.org/10.1016/j.jpdc.2023.02.013)
[14]
Ozgur O. Kilic, Nathan R. Tallent, Yasodhadevi Suriyakumar, Chenhao Xie, Andrés Marquez, and Stephane Eranian. MemGaze: Rapid and effective load-level memory and data analysis. In Proc. of the 2022 IEEE Conf. on Cluster Computing. IEEE, September 2022. (doi:10.1109/CLUSTER51413.2022.00058)
[15]
Arun Sathanur, Nathan R. Tallent, Patrick Konsor, Ken Koyanagi, Ryan McLaughlin, Joseph Olivas, and Michael Chynoweth. QuaL2M: Learning quantitative performance of latency-sensitive code. In Proc. of the 2022 IEEE Intl. Parallel and Distributed Processing Symp. Workshops (17th Intl. Workshop on Automatic Performance Tuning), pages 913–923. IEEE, May 2022. (doi:10.1109/IPDPSW55747.2022.00149)
[16]
Jou-An Chen, Hsin-Hsuan Sung, Nathan R. Tallent, Kevin Barker, Xipeng Shen, and Ang Li. Bit-GraphBLAS: Bit-level optimizations of matrix-centric graph processing on GPU. In Proc. of the 36st IEEE Intl. Parallel and Distributed Processing Symp. IEEE, May 2022. (doi:10.1109/IPDPS53621.2022.00056)
[17]
Oceane Bel, Sinjoni Mukhopadhyay, Nathan R. Tallent, Faisal Nawab, and Darrell Long. WinnowML: Stable feature selection for maximizing prediction accuracy of time-based system modeling. In Proc. of the 2021 IEEE Intl. Conf. on Big Data (Fifth IEEE Intl. Workshop on Benchmarking, Performance Tuning and Optimization for Big Data Applications), pages 3031–3041, Dec 2021. (doi:10.1109/BigData52589.2021.9671602)
[18]
Sayan Ghosh, Nathan R. Tallent, Marco Minutoli, Mahantesh Halappanavar, Ramesh Peri, and Ananth Kalyanaraman. Single-node partitioned-memory for huge graph analytics: Cost and performance tradeoffs. In Proc. of the Intl. Conf. for High Performance Computing, Networking, Storage and Analysis (SuperComputing), pages 1–14, New York, NY, USA, November 2021. ACM. (doi:10.1145/3458817.3476156)
[19]
Sayan Ghosh, Nathan R. Tallent, and Mahantesh Halappanavar. Characterizing performance of graph neighborhood communication patterns. IEEE Transactions on Parallel and Distributed Systems, August 2021. (doi:10.1109/TPDS.2021.3101425)
[20]
Oceane Bel, Joosep Pata, Jean-Roch Vlimant, Nathan Tallent, Justas Balcas, and Maria Spiropulu. Diolkos: Improving ethernet throughput through dynamic port selection. In 18th ACM International Conference on Computing Frontiers, New York, NY, USA, May 2021. ACM. (doi:10.1145/3457388.3458659)
[21]
Ryan D. Friese, Burcu O. Mutlu, Nathan R. Tallent, Joshua Suetterlein, and Jan Strube. Effectively using remote I/O for work composition in distributed workflows. In Proc. of the 2020 IEEE Intl. Conf. on Big Data, pages 426–433. IEEE Computer Society, December 2020. (doi:10.1109/BigData50022.2020.9378352)
[22]
Reet Barik, Marco Minutoli, Mahantesh Halappanavar, Nathan R. Tallent, and Ananth Kalyanaraman. Vertex reordering for real-world graphs and applications: An empirical evaluation. In Proc. of the 2020 IEEE Intl. Symp. on Workload Characterization, October 2020. (doi:10.1109/IISWC50251.2020.00031)
[23]
Oceane Bel, Kenneth Chang, Nathan R. Tallent, Dirk Duellmann, Ethan L. Miller, Faisal Nawab, and Darrell D. E. Long. Geomancy: Automated performance enhancement through data layout optimization. In 36th Intl. Conf. on Massive Storage Systems and Technology, October 2020. (doi:https://storageconference.us/2020/Papers/10.Geomancy.pdf)
[24]
Ozgur O. Kilic, Nathan R. Tallent, and Ryan D. Friese. Rapid memory footprint access diagnostics. In Proc. of the 2020 IEEE Intl. Symp. on Performance Analysis of Systems and Software, pages 273–284. IEEE Computer Society, October 2020. (doi:10.1109/ISPASS48437.2020.00047)
[25]
Ang Li, S. Song, J. Chen, J. Li, X. Liu, Nathan Tallent, and Kevin J. Barker. Evaluating modern GPU interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect. IEEE Transactions on Parallel and Distributed Systems, 31(1):94–110, January 2020. (doi:10.1109/TPDS.2019.2928289)
[26]
Joshua Suetterlein, Ryan D. Friese, Nathan R. Tallent, and Malachi Schram. TAZeR: Hiding the cost of remote I/O in distributed scientific workflows. In Proc. of the 2019 IEEE Intl. Conf. on Big Data, pages 383–394. IEEE Computer Society, December 2019. (doi:10.1109/BigData47090.2019.9006418)
[27]
Ozgur O. Kilic, Nathan R. Tallent, and Ryan D. Friese. Rapidly measuring loop footprints. In Proc. of IEEE Intl. Conf. on Cluster Computing (Workshop on Monitoring and Analysis for High Performance Computing Systems Plus Applications), pages 1–9. IEEE Computer Society, September 2019. (doi:10.1109/CLUSTER.2019.8891025)
[28]
Malachi Schram, Nathan Tallent, Ryan Friese, and Alok Singh. Application of deep learning on integrating prediction, provenance, and optimization. In Peter Hristov, Latchezar Betev, and Maarten Litmaath, editors, EPJ Web Conf., volume 214, September 2019. (doi:10.1051/epjconf/201921406007)
[29]
Tanveer Hossain Bhuiyan, Mahantesh Halappanavar, Ryan D. Friese, Hugh Medal, Luis de la Torre, Arun Sathanur, and Nathan R. Tallent. Stochastic programming approach for resource selection under demand uncertainty. In Dalibor Klusáček, Walfredo Cirne, and Narayan Desai, editors, Job Scheduling Strategies for Parallel Processing, pages 107–126, Cham, 2019. Springer International Publishing. (doi:10.1007/978-3-030-10632-4_6)
[30]
Alok Singh, Ilkay Altintas, Malachi Schram, and Nathan Tallent. Deep learning for enhancing fault tolerant capabilities of scientific workflows. In Second IEEE Intl. Workshop on Benchmarking, Performance Tuning and Optimization for Big Data Applications (Proc. of the IEEE Intl. Conf. on Big Data), pages 3905–3914, December 2018. (doi:10.1109/BigData.2018.8622509)
[31]
Ang Li, Shuaiwen Leon Song, Xu Liu, Nathan Tallent, and Kevin Barker. Tartan: Evaluating modern GPU interconnect via a multi-GPU benchmark suite. In Proc. of the 2018 IEEE Intl. Symp. on Workload Characterization, pages 191–202, September 2018. Best Paper Nominee. (doi:10.1109/IISWC.2018.8573483)
[32]
Ryan D. Friese, Nathan R. Tallent, Malachi Schram, Mahantesh Halappanavar, and Kevin J. Barker. Optimizing distributed data-intensive workflows. In Proc. of the 2018 IEEE Conf. on Cluster Computing, pages 279–289. IEEE, September 2018. (doi:10.1109/CLUSTER.2018.00045)
[33]
Nitin A. Gawande, Jeff A. Daily, Charles Siegel, Nathan R. Tallent, and Abhinav Vishnu. Scaling deep learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing. Future Generation Computer Systems, May 2018. (doi:https://doi.org/10.1016/j.future.2018.04.073)
[34]
Nathan R. Tallent, Darren J. Kerbyson, and Adolfy Hoisie. Representative paths analysis. In Proc. of the Intl. Conf. for High Performance Computing, Networking, Storage and Analysis (SuperComputing), pages 34:1–34:12, New York, NY, USA, November 2017. ACM. (doi:10.1145/3126908.3126962)
[35]
Nathan R. Tallent, Nitin A. Gawande, Charles Siegel, Abhinav Vishnu, and Adolfy Hoisie. Evaluating on-node GPU interconnects for deep learning workloads. In Stephen Jarvis, Steven Wright, and Simon Hammond, editors, High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, pages 3–21. Springer International Publishing, December 2017. (doi:10.1007/978-3-319-72971-8_1)
[36]
Malachi Schram, Vikas Bansal, Ryan D. Friese, Nathan R. Tallent, Jian Yin, Kevin J. Barker, Eric Stephan, Mahantesh Halappanavar, and Darren J. Kerbyson. Integrating prediction, provenance, and optimization into high energy workflows. J. Phys. Conf. Ser., 898(6):062052, November 2017.
[37]
Ryan D. Friese, Nathan R. Tallent, Abhinav Vishnu, Darren J. Kerbyson, and Adolfy Hoisie. Generating performance models for irregular applications. In Proc. of the 31st IEEE Intl. Parallel and Distributed Processing Symp., pages 317–326, Los Alamitos, CA, USA, May 2017. IEEE Computer Society. (doi:10.1109/IPDPS.2017.61)
[38]
Nathan R. Tallent, Kevin J. Barker, Daniel Chavarrıa-Miranda, Antonino Tumeo, Mahantesh Halappanavar, Andrés Márquez, Darren J. Kerbyson, and Adolfy Hoisie. Modeling the impact of silicon photonics on graph analytics. In Proc. of the 11th IEEE Intl. Conf. on Networking, Architecture, and Storage, pages 1–11. IEEE Computer Society, August 2016. (doi:10.1109/NAS.2016.7549410)
[39]
Nathan R. Tallent, Joseph B. Manzano, Nitin A. Gawande, Seunghwa Kang, Darren J. Kerbyson, Adolfy Hoisie, and Joseph K. Cross. Algorithm and architecture independent benchmarking with SEAK. In Proc. of the 30th IEEE Intl. Parallel and Distributed Processing Symp., pages 63–72, Los Alamitos, CA, USA, May 2016. IEEE Computer Society. (doi:10.1109/IPDPS.2016.25)
[40]
Abhinav Vishnu, Hubertus van Dam, Nathan R. Tallent, Darren J. Kerbyson, and Adolfy Hoisie. Fault modeling of extreme scale applications using machine learning. In Proc. of the 30th IEEE Intl. Parallel and Distributed Processing Symp., pages 222–231, Los Alamitos, CA, USA, May 2016. IEEE Computer Society. (doi:10.1109/IPDPS.2016.111)
[41]
Mahantesh Halappanavar, Malachi Schram, Luis de La Torre, Kevin Barker, Nathan R. Tallent, and Darren Kerbyson. Towards efficient scheduling of data intensive high energy physics workflows. In WORKS '15: Workshop on Workflows in Support of Large-Scale Science, held in conjunction with SuperComputing 15, November 2015. (doi:10.1145/2822332.2822335)
[42]
Akshay Venkatesh, Abhinav Vishnu, Khaled Hamidouche, Nathan Tallent, Dhabaleswar (DK) Panda, Darren Kerbyson, and Adolfy Hoisie. A case for application-oblivious energy-efficient MPI runtime. In Proc. of the Intl. Conf. for High Performance Computing, Networking, Storage and Analysis (SuperComputing), SC '15, pages 29:1–29:12, New York, NY, USA, November 2015. ACM. Best Student Paper Nominee. (doi:10.1145/2807591.2807658)
[43]
Nitin A. Gawande, Joseph B. Manzano, Antonino Tumeo, Nathan R. Tallent, Darren J. Kerbyson, and Adolfy Hoisie. Power and performance trade-offs for space time adaptive processing. In ASAP '15: Proc. of the 26th IEEE Intl. Conf. on Application-specific Systems, Architectures and Processors, pages 41–48, July 2015. (doi:10.1109/ASAP.2015.7245703)
[44]
Nathan R. Tallent, Abhinav Vishnu, Huub Van Dam, Jeff Daily, Darren Kerbyson, and Adolfy Hoisie. Diagnosing the causes and severity of one-sided message contention. In Proc. of the 20th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, New York, NY, USA, 2015. ACM. (doi:10.1145/2688500.2688516)
[45]
Nathan R. Tallent, Adolfy Hoisie, and Charity Plata. Palm: Making application modeling easier. PNNL Computational Sciences and Mathematics Division Research Highlights, May 2014. http://www.pnnl.gov/science/highlights/highlight.asp?id=2652.
[46]
Nathan R. Tallent and Adolfy Hoisie. Palm: Easing the burden of analytical performance modeling. In Proc. of the 28th ACM Intl. Conf. on Supercomputing, pages 221–230, New York, NY, USA, 2014. ACM. (doi:10.1145/2597652.2597683)
[47]
Kevin Barker, Thomas Benson, Dan Campbell, David Ediger, Roberto Gioiosa, Adolfy Hoisie, Darren Kerbyson, Joseph Manzano, Andres Marquez, Leon Song, Nathan R. Tallent, and Antonino Tumeo. PERFECT (Power Efficiency Revolution For Embedded Computing Technologies) Benchmark Suite Manual. Pacific Northwest National Laboratory and Georgia Tech Research Institute, December 2013. http://hpc.pnnl.gov/projects/PERFECT/.
[48]
Xu Liu, John Mellor-Crummey, and Nathan R. Tallent. Analyzing application performance bottlenecks on Intel's SCC. Proc. of the TACC-Intel Highly Parallel Computing Symp., 2012. (PDF)
[49]
Nathan R. Tallent and John Mellor-Crummey. Using sampling to understand parallel program performance. In Holger Brunst, Matthias S. Müller, Wolfgang E. Nagel, and Michael M. Resch, editors, Tools for High Performance Computing 2011, pages 13–25. Springer, 2012. (doi:10.1007/978-3-642-31476-6_2)
[50]
Nathan R. Tallent and Darren Kerbyson. Data-centric performance analysis of PGAS applications. In WHIST 2012: Proc. of the 2nd Intl. Workshop on High-performance Infrastructure for Scalable Tools, held with the 26th Intl. Conf. on Supercomputing, 2012.
[51]
Nathan R. Tallent, John M. Mellor-Crummey, Michael Franco, Reed Landrum, and Laksono Adhianto. Scalable fine-grained call path tracing. In Proc. of the 25th Intl. Conf. on Supercomputing, pages 63–74, New York, NY, USA, 2011. ACM. (doi:10.1145/1995896.1995908)
[52]
Nathan R. Tallent, Laksono Adhianto, and John M. Mellor-Crummey. Scalable identification of load imbalance in parallel executions using call path profiles. In Proc. of the 2010 ACM/IEEE Intl. Conf. for High Performance Computing, Networking, Storage and Analysis (SuperComputing), pages 1–11, Washington, DC, USA, 2010. IEEE Computer Society. (PDF) (doi:10.1109/SC.2010.47)
[53]
Laksono Adhianto, John Mellor-Crummey, and Nathan R. Tallent. Effectively presenting call path profiles of application performance. In PSTI 2010: Proc. of the 2010 Workshop on Parallel Software Tools and Tool Infrastructures, held with the 2010 Intl. Conf. on Parallel Processing, pages 179–188, Los Alamitos, CA, USA, 2010. IEEE Computer Society. (PDF) (doi:10.1109/ICPPW.2010.35)
[54]
Laksono Adhianto, Sinchan Banerjee, Mike Fagan, Mark Krentel, Gabriel Marin, John Mellor-Crummey, and Nathan R. Tallent. HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience, 22(6):685–701, 2010. (PDF) (doi:10.1002/cpe.1553)
[55]
Nathan R. Tallent, John M. Mellor-Crummey, and Allan Porterfield. Analyzing lock contention in multithreaded applications. In Proc. of the 15th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, pages 269–280, New York, NY, USA, 2010. ACM. (PDF) (doi:10.1145/1693453.1693489)
[56]
Nathan R. Tallent and John M. Mellor-Crummey. Identifying performance bottlenecks in work-stealing computations. Computer, 42(12):44–50, 2009. (doi:10.1109/MC.2009.396)
[57]
Nathan R. Tallent, John M. Mellor-Crummey, Laksono Adhianto, Michael W. Fagan, and Mark Krentel. Diagnosing performance bottlenecks in emerging petascale applications. In Proc. of the 2009 ACM/IEEE Intl. Conf. for High Performance Computing, Networking, Storage and Analysis (SuperComputing), pages 1–11, New York, NY, USA, 2009. ACM. (PDF) (doi:10.1145/1654059.1654111)
[58]
Nathan R. Tallent, John Mellor-Crummey, and Michael W. Fagan. Binary analysis for measurement and attribution of program performance. In Proc. of the 2009 ACM SIGPLAN Conf. on Programming Language Design and Implementation, pages 441–452, New York, NY, USA, 2009. ACM. Distinguished Paper. (PDF) (doi:10.1145/1542476.1542526)
[59]
Robert Fowler, Laksono Adhianto, Bronis de Supinski, Michael Fagan, Todd Gamblin, Mark Krentel, John Mellor-Crummey, Martin Schulz, and Nathan Tallent. Frontiers of performance analysis on leadership-class systems. Journal of Physics: Conference Series, 180:012041 (6pp), 2009.
[60]
Nathan R. Tallent and John Mellor-Crummey. Effective performance measurement and analysis of multithreaded applications. In Proc. of the 14th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, pages 229–240, New York, NY, USA, 2009. ACM. (PDF) (doi:10.1145/1504176.1504210)
[61]
Nathan Tallent, John Mellor-Crummey, Laksono Adhianto, Mike Fagan, and Mark Krentel. HPCToolkit: Performance tools for scientific computing. Journal of Physics: Conference Series, 125:012088 (5pp), 2008.
[62]
L. Adhianto, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. HPCToolkit: Performance measurement and analysis for supercomputers with node-level parallelism. In Proc. of the Workshop on Node Level Parallelism for Large Scale Supercomputers, held with Supercomputing 2008, November 2008.
[63]
John Mellor-Crummey and Nathan R. Tallent. A methodology for accurate, effective and scalable performance analysis of application programs. In Proc. of the Workshop on Tools, Infrastructures and Methodologies for the Evaluation of Research Systems, held with the 2008 IEEE Intl. Symp. on Performance Analysis of Systems and Software, pages 4–11, February 2008.
[64]
Jean Utke, Uwe Naumann, Mike Fagan, Nathan Tallent, Michelle Strout, Patrick Heimbach, Chris Hill, and Carl Wunsch. OpenAD/F: A modular open-source tool for automatic differentiation of Fortran codes. ACM Trans. Math. Softw., 34(4):1–36, 2008. (doi:10.1145/1377596.1377598)
[65]
Nathan Froyd, Nathan Tallent, John Mellor-Crummey, and Robert Fowler. Call path profiling for unmodified, optimized binaries. In GCC Summit '06: Proc. of the GCC Developers' Summit, 2006, pages 21–36, 2006.
[66]
John Mellor-Crummey, Robert Fowler, Gabriel Marin, and Nathan Tallent. HPCView: A tool for top-down analysis of node performance. The Journal of Supercomputing, 23(1):81–104, 2002. (PDF) (doi:10.1023/A:1015789220266)

(Made with bib2xhtml.)