Sriram Krishnamoorthy

[ Home ] [ Research ] [ Publications ] [ CV ] [ Links ]

Publications

My publications on DBLP and Google Scholar. The most up-to-date information can be found in the cv.

| 2020 | 2019 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013 | 2012 | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 |

2020

NWChem: past,present, and future
E. Apra, E. J. Bylaska, W. A. de Jong, N. Govind, K. Kowalski, T. P. Straatsma, M. Valiev, J. J. van Dam, Y. Alexeev, J. Anchell, V. Anisimov, F. W. Aquino, R. Atta-Fynn, J. Autschbach, N. P. Bauman, J. C. Becca, D. E. Bernholdt, K. Bhaskaran-Nair, S. Bogatko, P. Borowski, J. Boschen, J. Brabec, A. Bruner, E. Cauet, Y. Chen, G. N. Chuev, C. J. Cramer, J. Daily, M. J. O. Deegan, T. H. Dunning, Jr., M. Dupuis, K. G. Dyall, G. I. Fann, S. A. Fischer, A. Fonari, H. Fruchtl, L. Gagliardi, J. Garza, N. Gawande, S. Ghosh, K. Glaesemann, A. W. Gotz, J. Hammond, V. Helms, E. D. Hermes, K. Hirao, S. Hirata, M. Jacquelin, L. Jensen, B. G. Johnson, H. Jonsson, R. A. Kendall, M. Klemm, R. Kobayashi, V. Konkov, S. Krishnamoorthy, M. Krishnan, Z. Lin, R. D. Lins, R. J. Littleeld, A. J. Logsdail, K. Lopata, W. Ma, A. V. Marenich, J. Martin del Campo, D. Mejia-Rodriguez, J. E. Moore, J. M. Mullin, T. Nakajima, D. R. Nascimento, J. A. Nichols, P. J. Nichols, J. Nieplocha, A. Otero de la Roza, B. Palmer, A. Panyala, T. Pirojsirikul, B. Peng, R. Peverati, J. Pittner, L. Pollack, R. M. Richard, P. Sadayappan, G. C. Schatz, W. A. Shelton, D. W. Silverstein, D. M. A. Smith, T. A. Soares, D. Song, M. Swart, H. L. Taylor, G. S. Thomas, V. Tipparaju, D. G. Truhlar, K. Tsemekhman, T. Van Voorhis, A. Vazquez-Mayagoitia, P. Verma, O. Villa, A. Vishnu, K. D. Vogiatzis, D. Wang, J. H. Weare, M. J. Williamson, T. L. Windus, K. Wolinski, A. T. Wong, Q. Wu, C. Yang, Q. Yu, M. Zacharias, Z. Zhang, Y. Zhao, and R. J. Harrison
Journal of Chemical Physics vol:152(18), April 2020
Green's function coupled cluster simulation of the near-valence ionizations of DNA-fragments
B. Peng, K. Kowalski, A. Panyala, and S. Krishnamoorthy
Journal of Chemical Physics vol:152(1) article no: 011101, January 2020.
FPDetect: efficient reasoning about stencil programs using selective direct evaluation
A. Das, S. Krishnamoorthy, I. Briggs, G. Gopalakrishnan, and R. Tipireddy
ACM Transactions on Architecture and Code Optimization (to appear)
Reliability analysis for unreliable FSM computations
A. Sabet, J. Qiu, Z. Zhao, and S. Krishnamoorthy
ACM Transactions on Architecture and Code Optimization (to appear)
Density matrix quantum circuit simulation via the BSP machine on modern GPU clusters Best Paper Finalist
A. Li, O. Subasi, X. Yang, and S. Krishnamoorthy
Supercomputing (SC), November 2020
Scalable yet Rigorous Floating-Point Error Analysis Best Student Paper Finalist
A. Das, I. Briggs, G. Gopalakrishnan, S. Krishnamoorthy, and P. Panchekha.
Supercomputing (SC), November 2020
Scalable heterogeneous execution of a coupled-cluster model with triples
J. Kim, A. Panyala, B.Peng, K. Kowalski, P. Sadayappan, and S. Krishnamoorthy.
Supercomputing (SC), November 2020

2019

I. Nisa, J. Li, A. Sukumaran-Rajan, P. Rawat, S. Krishnamoorthy, and P. Sadayappan
An efficient mixed-mode representation of sparse tensors
Supercomputing (SC), November 2019.
FailAmp: Relativization Transformation for Soft Error Detection in Structured Address Generation
I. Briggs, A. Das, V. Sharma, M. Baranowski, S. Krishnamoorthy, Z. Rakamaric, and G. Gopalakrishnan
ACM Transactions on Architecture and Code Optimization (to appear)
Extracting SIMD parallelism from recursive task-parallel programs
B. Ren, S. Balakrishna, Y. Jo, S. Krishnamoorthy, K. Agrawal, and M. Kulkarni.
ACM Transactions on Parallel Computing, September 2019
Downfolding of many-body Hamiltonians using active-space models: extension of the sub-system embedding sub-algebras approach to unitary coupled cluster formalisms
N. Bauman, E. Bylaska, S. Krishnamoorthy, G. Hao Low, N. Wiebe, C. Granade, M. Roettler, M. Troyer, and K. Kowalski
Journal of Chemical Physics, 2019
Mapping arbitrarily sparse two-body interactions on one-dimensional quantum circuits
A. Khan, M. Halappanavar, T. Hagge, K. Kowalski, A. Pothen, and S. Krishnamoorthy
International Conference on High Performance Computing, Data, and Analytics, December 2019
Ground-truth prediction to accelerate soft-error impact analysis for iterative methods
B. Mutlu, G. Kestor, A. Cristal, O. Unsal, and S. Krishnamoorthy
International Conference on High Performance Computing, Data, and Analytics, December 2019
NoC-enabled Software/Hardware Co-Design Framework for Accelerating k-mer Counting Best Paper
B. Joardar, P. Ghosh, P. Pande, A. Kalyanaraman, and S. Krishnamoorthy
IEEE/ACM International Symposium on Networks-on-Chip, October 2019
Performance models for data transfers: a case study with molecular chemistry kernels
S.Kumar, L. Eyraud-Dubois, and S.Krishnamoorthy
International Conference on Parallel Processing, August 2019
BonVoision: leveraging spatial data smoothness for recovery from memory soft errors
B. Fang, K. Pattabiraman, M. Ripeanu, and S. Krishnamoorthy
International Conference on Supercomputing, June 2019
MULKSG: MULtiple K simultaneous graph assembly
C. Wright, S. Krishnamoorty, and M. Kulkarni
International Conference on Algorithms for Computational Biology, May 2019
PaKman: scalable assembly of large genomes on distributed memory machines
P. Ghosh, S. Krishnamoorthy, and A.Kalyanaraman
IEEE International Parallel & Distributed Processing Symposium, May 2019
Q# and NWChem: Tools for Scalable Quantum Chemistry on Quantum Computers
G. Low, N. Bauman, C. Granade, B. Peng, N. Wiebe, E. Bylaska, D. Wecker, S. Krishnamoorthy, M. Roetteler, K. Kowalski, M. Troyer, N. Baker.
arXiv:1904.01131, April 2019
A code generator for high-performance tensor contractions on GPUs
J. Kim, A. Sukumaran-Rajan, V. Thumma, S. Krishnamoorthy, A. Panyala, L. Pouchet, A. Rountev, and P. Sadayappan
International Symposium on Code Generation and Optimization (CGO), February 2019
Accelerating the Global Arrays ComEx Runtime using Multiple Progress Ranks
N. Gawande, K. Kowalski, B. Palmer, S. Krishnamoorthy, E. Apra, J. Manzano, V. Amatya, M. Zalewski, and J. Crawford
Workshop on Exascale MPI, November 2019
Toward Generalized Tensor Algebra for ab initio Quantum Chemistry Methods
E. Mutlu, K. Kowalski, and S. Krishnamoorthy
ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, June 2019.

2018

NUMA-Caffe: NUMA-aware deep learning neural networks
P. Roy, S. Song, S. Krishnamoorthy, A. Vishnu, D. Sengupta, and X. Liu
ACM Transactions on Architecture and Code Optimization (TACO) vol:15(2) pp:24:1-24:26, June 2018
Exploring the capabilities of support vector machines in detecting silent data corruptions
O. Subasi, L. Bautista-Gomez, P. Balaprakash, O. Unsal, S. Krishnamoorthy, F. Cappello, A. Cristal, S. Di, and J. Labarta
Sustainable Computing: Informatics and Systems (SUSCOM) vol:19 pp:277-290, September 2018
Argobots: a lightweight low-level threading and tasking framework
S. Seo, A. Amer , P. Balaji, C. Bordage, A. Brooks, A. Castello, D. Genet, T. Herault, G. Bosilca, P. Jindal, H. Lu, Laxmikant V. Kale, S. Krishnamoorthy, J. Lifflander, E. Meneses, M. Snir, Y. Sun, and P. Beckman
IEEE Transactions on Parallel and Distributed Systems (TPDS) vol:29(3) pp:512-526, March 2018
HPC software verification in action: a case study with tensor transposition
E. Mutlu, A. Panyala, and S. Krishnamoorthy
Second International Workshop on Software Correctness for HPC Applications (Correctness), November 2018
Quantification, trade-off analysis, and optimal checkpoint placement for reliability and availability
O. Subasi, R. Tipireddy, and S. Krishnamoorthy
International Conference on High Performance Computing, Data, and Analytics (HiPC), December 2018
Characterization of the impact of soft errors on iterative methods
B. Mutlu, G. Kestor, J. Manzano, O. Unsal, S. Chatterjee, and S. Krishnamoorthy
International Conference on High Performance Computing, Data, and Analytics (HiPC), December 2018
Characterizing the impact of soft errors affecting floating-point ALUs using RTL-level fault injection
O. Subasi, C. Chang, M. Erez, and S. Krishnamoorthy
International Conference on Parallel Processing (ICPP), September 2018
GPU code optimization using abstract kernel emulation and sensitivity analysis
C. Hong, A. Sukumaran-Rajam, J. Kim, P. Rawat, S. Krishnamoorthy, L. Pouchet, F. Rastello, and P. Sadayappan
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2018
Comparative analysis of soft-error detection strategies: a case study with iterative methods
G. Kestor, B. Mutlu, J. Manzano, O. Subasi, O. Unsal and S. Krishnamoorthy
ACM International Conference on Computing Frontiers (CF), May 2018
Optimizing tensor contractions in CCSD(T) for efficient execution on GPUs
J. Kim, A. Sukumaran-Rajam, C. Hong, A. Panyala, R. Srivastava, S. Krishnamoorthy, and P. Sadayappan
International Conference on Supercomputing (ICS), June 2018
TTLG--An efficient tensor transposition library for GPUs
J. Vedurada, A.S. Suresh, A. Rajam, J. Kim, C. Hong, S. Krishnamoorthy, V.K. Nandivada, A. Panyala, R. Srivastava, and P. Sadayappan
IEEE International Parallel & Distributed Processing Symposium (IPDPS), May 2018
Comparative analysis of soft-error detection strategies: a case study with iterative methods
G. Kestor, B. Mutlu, J. Manzano, O. Subasi, O. Unsal, and S. Krishnamoorthy
ACM International Conference on Computing Frontiers (CF), May 2018
On the theory of speculative checkpointing: time and energy considerations
O. Subasi and S. Krishnamoorthy
ACM International Conference on Computing Frontiers (CF), May 2018
Understanding scale-dependent soft-error behavior of scientific applications
G. Kestor, R. Gioiosa, I. Peng, and S. Krishnamoorthy
IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), May 2018
Efficient tensor transposition library for GPUs
J. Vedurada, A.S. Suresh, A. Rajam, J. Kim, C. Hong, S. Krishnamoorthy, V.K. Nandivada, A. Panyala, R. Srivastava, and P. Sadayappan
IEEE International Parallel & Distributed Processing Symposium (IPDPS), May 2018
Lightweight detection of cache conflicts
P. Roy, S. Song, S. Krishnamoorthy, X. Li
International Symposium on Code Generation and Optimization (CGO), February 2018
Analytical modeling of cache behavior for affine programs
W. Bao, S. Krishnamoorthy, L. Pouchet, and P. Sadayappan
ACM Symposium on Principles of Programming Languages (POPL), January 2018

2017

Approximation techniques for iterative graph algorithms
A. Panyala, O. Subasi, M. Halappanavar, A. Kalyanaraman, and S. Krishnamoorthy
International Conference on High Performance Computing, Data, and Analytics (HiPC), December, 2017
Efficient cache simulation for affine computations
W. Bao, P. Rawat, M. Kong, S. Krishnamoorthy, L. Pouchet, and P. Sadayappan
International Workshop on Languages and Compilers for Parallel Computing (LCPC), October, 2017
MACORD: online adaptive machine learning framework for silent error detection
O. Subasi, S. Di, P. Balaprakash, O. Unsal, J. Labarta, A. Cristal, S. Krishnamoorthy, and F. Cappello
International Workshop on Fault Tolerant Systems (FTS), September 2017
Toward a general theory of optimal checkpoint placement
O. Subasi, G. Kestor, and S. Krishnamoorthy
IEEE CLUSTER, September 2017
A Gaussian process approach for effective soft error detection
O. Subasi and S. Krishnamoorthy
IEEE CLUSTER, September 2017
Locality-aware dynamic task graph scheduling
J. Maglalang, K. Agrawal, and S. Krishnamoorthy
International Conference on Parallel Processing (ICPP), August 2017
Cache locality optimization for recursive programs
J. Lifflander and S. Krishnamoorthy
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2017
Localized fault recovery for nested fork-join programs
G. Kestor, S. Krishnamoorthy, and W. Ma
Proceedings of the 31st IEEE International Parallel & Distributed Processing Symposium (IPDPS), April 2017
Optimizing the four-index integral transform using data movement lower bounds analysis
S. Rajbhandari, F. Rastello, S. Krishnamoorthy, K. Kowalski, and P. Sadayappan
17th ACM SIGPLAN Annual Symposium on Principles and Practices of Parallel Programming (PPoPP), February 2017
Exploiting vector and multicore parallelism for recursive data- and task-parallel programs
B. Ren, S. Krishnamoorthy, K. Agrawal, and M. Kulkarni
17th ACM SIGPLAN Annual Symposium on Principles and Practices of Parallel Programming (PPoPP), February 2017

2016

User-assisted store recycling for dynamic task graph schedulers
M. Kurt, S. Krishnamoorthy, G. Agrawal, and B. Ren
ACM Transactions on Architecture and Code Optimization (TACO) vol:13(4), pp:55:1-55:24, December 2016
Static and dynamic frequency scaling on multicore CPUs
W. Bao, C. Hong, S. Krishnamoorthy, C. D. Sudheer, L.N. Pouchet, F. Rastello, and P. Sadayappan
ACM Transactions on Architecture and Code Optimization (TACO) vol:13(4), pp:55:1-55:26, December 2016
V. Sharma, G. Gopalakrishnan, and S. Krishnamoorthy
PRESAGE: protecting structured address generation against soft errors
International Conference on High Performance Computing, Data, and Analytics, December 2016
S. Rajbhandari, J. Kim, S. Krishnamoorthy, L. Pouchet, F. Rastello, R. Harrison, and P. Sadayappan
A domain-specific compiler for a parallel multiresolution adaptive numerical simulation environment
Supercomputing (SC), November 2016
On the impact of widening vector registers on sequence alignment
J. Daily, A. Kalyanaraman, S. Krishnamoorthy, and B. Ren
International Conference on Parallel Processing (ICPP), September 2016
Effective padding of multi-dimensional arrays to avoid cache conflict misses
C. Hong, W. Bao, A. Cohen, S. Krishnamoorthy, L. Pouchet, J. Ramanujam, F. Rastello, and P. Sadayappan
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2016
New-Sum: a novel online ABFT scheme for general iterative methods
D. Tao, S. Song, S. Krishnamoorthy, P. Wu, X. Liang, E. Zhang, D. Kerbyson, and Z. Chen
ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC), May 2016
On fusing recursive traversals of k-ary trees
S. Rajbhandari, J. Kim, S. Krishnamoorthy, L. Pouchet, F. Rastello, R. Harrison, and P. Sadayappan
International Conference on Compiler Construction, March 2016
PolyCheck: dynamic verification of iteration space transformations on affine programs
W. Bao, S. Krishnamoorthy, L. Pouchet, F. Rastello, and P. Sadayappan.
ACM Symposium on Principles of Programming Languages (POPL), January 2016

2015

CilkSpec: Optimistic Concurrency for Cilk
S. Aga, S. Krishnamoorthy, S. Narayanasamy.
Supercomputing (SC), November 2015
Efficient execution of recursive programs on commodity vector hardware
B. Ren, Y. Jo, S. Krishnamoorthy, K. Agrawal, and M. Kulkarni
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2015
A work stealing based approach for enabling scalable optimal sequence homology detection
J. Daily, A. Kalyanaraman, S. Krishnamoorthy, and A. Vishnu
Journal of Parallel and Distributed Computing vol:79-80, pp:132-142, May 2015
On the impact of execution models: a case study in computational chemistry
D. Chavarria, M. Halappanavar, S. Krishnamoorthy, J. Manzano, A. Vishnu, A. Hoisie
Joint Workshop on High-Level Parallel Programming Models and supportive Environments and Large-Scale Parallel Processing (HIPS-LSPP), May 2015
Global transformations for legacy parallel applications via structural analysis and rewriting
D. Miranda, A. Panyala, W. Ma, A. Prantl, and S. Krishnamoorthy
Parallel Computing vol:43, pp:1-26, March 2015

2014

Communication-optimal framework for contracting distributed tensors Best Paper Finalist
S. Rajbhandari, A. Nikam, P. Lai, K. Stock, S. Krishnamoorthy, and P. Sadayappan
Supercomputing (SC), November 2014
Fault-tolerant dynamic task graph scheduling Best Student Paper Finalist
M. Kurt, S. Krishnamoorthy, K. Agrawal, and G. Agrawal
Supercomputing (SC), November 2014
Optimizing data locality for fork/join programs using constrained work stealing
J. Lifflander, S. Krishnamoorthy, and L. Kale
Supercomputing (SC), November 2014

SCaLeM: a framework for characterizing and analyzing execution models

20 Years of Beowulf, October 2014

Scalable replay with partial-order dependencies for message-logging fault tolerance Best Student Paper Award
J. Lifflander, E. Meneses, H. Menon, P. Miller, S. Krishnamoorthy, and L. Kale
IEEE CLUSTER, September 2014
CAST: contraction algorithm for symmetric tensors
S. Rajbhandari, A. Nikam, P. Lai, K. Stock, S. Krishnamoorthy, and P. Sadayappan
International Conference on Parallel Processing, September 2014
SCaLeM: a framework for characterizing and analyzing execution models
D. Chavarria, J. Manzano, S. Krishnamoorthy, A. Vishnu, K. Barker, and A. Hoisie
20 Years of Beowulf workshop, October 2014
Checksumming strategies for data in volatile memories
H. Arafat, S. Krishnamoorthy, and P. Sadayappan
International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2), September 2014
Compiler-assisted detection of transient memory errors Paper(ACM DL)
S. Tavarageri, S. Krishnamoorthy, and P. Sadayappan
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2014
Addressing failures in exascale computing
M. Snir, R. Wisniewski, J. Abraham, S. Adve, S. Bagchi, P. Balaji, J. Belak, P. Bose, F. Cappello, B. Carlson, A. Chien, P. Coteus, N. Debardeleben, P. Diniz, C. Engelmann, M. Erez, S. Fazzari, A. Geist, R. Gupta, F. Johnson, S. Krishnamoorthy, S. Leyffer, D. Liberty, S. Mitra, T. Munson, R. Schreiber, J. Stearley, and E. Van Hensbergen
International Journal of High Performance Computing Applications vol:28(2), pp:127-171, May 2014

2013

A framework for load balancing of tensor contraction expressions via dynamic task partitioning
P. Lai, K. Stock, S. Rajbhandari, S. Krishnamoorthy, and P. Sadayappan
SC 2013, November 2013
Efficient scheduling of recursive control flow on GPUs
X. Huo, S. Krishnamoorthy, and G. Agrawal
27th International Conference on Supercomputing (ICS), June 2013
Steal Tree: low-overhead tracing of work stealing schedulers
J. Lifflander, S. Krishnamoorthy, and L. Kale
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2013
Non-iterative multireference Coupled Cluster methods on heterogeneous CPU-GPU systems
K. Bhaskaran-Nair, W. Ma, S. Krishnamoorthy, O. Villa, H. van Dam, E. Apra, and K. Kowalski
Journal of Chemical Theory and Computation, 2013
Multi-fault tolerance for Cartesian data distributions
N. Ali, S. Krishnamoorthy, M. Halappanavar, and J. Daily
International Journal of Parallel Programming, Computing Frontiers special issue vol:41(3) pp:469-493 2013

2012

A scalable infrastructure for the performance analysis of passive target synchronization
M. A. Hermanns, S. Krishnamoorthy, and F. Wolf
Parallel Computing doi:10.1016/j.parco.2012.09.002, 2012
Towards scalable optimal sequence homology detection
J. Daily, S. Krishnamoorthy, and A. Kalyanaraman
Workshop on Parallel Algorithms and Software for Analysis of Massive Graphs (ParGraph), December 2012
On the use of term rewriting for performance optimization of legacy HPC
A. Panyala, D. Chavarria, and S. Krishnamoorthy
International Conference on Parallel Processing (ICPP), September 2012
Work stealing and persistence-based load balancers for iterative overdecomposed applications
J. Lifflander, S. Krishnamoorthy, and L. Kale
ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC), June 2012
Data-driven fault tolerance for work stealing computations
W. Ma and S. Krishnamoorthy
26th International Conference on Supercomputing (ICS), June 2012
Empirical performance model-driven data layout optimization and library call selection
Q. Lu, X. Gao, S. Krishnamoorthy, G. Baumgartner, J. Ramanujam, and P. Sadayappan
Journal of Parallel and Distributed Computing vol:72(3), pp:338-352, March 2012
Performance characterization of global address space applications: a case study with NWChem
J. Hammond, S. Krishnamoorthy, S. Shende, N. Romero, and A. Malony
Concurrency and Computation: Practice and Experience vol:24(2), pp:135-154, 2012
Parameterized micro-benchmarking: an auto-tuning approach for complex applications
W. Ma, S. Krishnamoorthy, and G. Agrawal
ACM International Conference on Computing Frontiers, June 2012
Global Futures: a multithreaded execution model for Global Arrays-based applications
D. Chavarria, S. Krishnamoorthy, and A. Vishnu
IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 2012
Load balancing of dynamical nucleation theory monte carlo simulations through resource sharing barriers
H. Arafat, J. Dinan, S. Krishnamoorthy, T. Windus, and P. Sadayappan
IEEE International Parallel and Distributed Processing Symposium, May 2012
Supporting the Global Arrays PGAS Model Using MPI One-Sided Communication
J. Dinan, P. Balaji, J. Hammond, S. Krishnamoorthy, and V. Tipparaju
IEEE International Parallel and Distributed Processing Symposium, May 2012

2011

Power- and Cooling- Aware Parallel Performance Diagnostics
RL. Knapp, KL. Karavanic, S. Krishnamoorthy, and A. Marquez
The 23rd IASTED International Conference on Parallel and Distributed Computing and Systems, December 2011
Scalable Implementations of Accurate Excited-state Coupled Cluster Theories: Application of High-level Methods to Porphyrin-based Systems
K. Kowalski, S. Krishnamoorthy, R. Olson, V. Tipparaju, and E. Apra
Supercomputing (SC), November 2011
Optimizing Tensor Contraction Expressions for Hybrid CPU-GPU Execution
W. Ma, S. Krishnamoorthy, O. Villa, K. Kowalski, and G. Agrawal
Cluster Computing Special Issue
Noncollective Communicator Creation in MPI
J. Dinan, S. Krishnamoorthy, P. Balaji, J. Hammond, M. Krishnan, V. Tipparaju, and A. Vishnu
Special Session on Improving MPI User And Developer Inter- action, EuroMPI, September 2011
A scalable replay-based infrastructure for the performance analysis of one-sided communication
M. Hermanns, S. Krishnamoorthy, and F. Wolf
First International Workshop on High-performance Infrastructure for Scalable Tools (WHIST), May 2011
Fault Oblivious eXascale Whitepaper
E. Van Hensbergen, R. Minnich, C. Janssen, S. Krishnamoorthy, A. Marquez, M. Gokhale, P. Sadayappan, J. Mckie, and J. Appavo
International Workshop on Runtime and Operating Systems for Supercomputers (ROSS), May 2011
Application-Specific Fault Tolerance via Data Access Characterization
N. Ali, S. Krishnamoorthy, N. Govind, K. Kowalski, and P. Sadayappan
Euro-Par 2011, August 2011
Massively parallel implementation of the multi-reference Brillouin-Wigner CCSD method
J. Brabec, S. Krishnamoorthy, HJJ. van Dam, K. Kowalski, and J. Pittner
Chemical Physics Letters vol:514(4-6), pp:347-351, 2011
The role of many-body effects in describing low-lying excited states of ∏-conjugated chromophores: high-level equation-of-motion coupled-cluster studies of fused porphyrin systems
K. Kowalski, R. Olson, S. Krishnamoorthy, V. Tipparaju, and E. Apra
Journal of Chemical Theory and Computation vol:7(7) pp:2200-2208, 2011
GPU-Based Implementations of the Noniterative Regularized-CCSD(T) Corrections: Applications to Strongly Correlated Systems
W. Ma, S. Krishnamoorthy, O. Villa, and K. Kowalski
Journal of Chemical Theory and Computation, vol:7(5) pp:1316-1327, 2011
Tolerating Correlated Failures for Generalized Cartesian Distributions via Bipartite Matching
N. Ali, S. Krishnamoorthy, M. Halappanavar, and J. Daily
ACM International Conference on Computing Frontiers (CF'11), May 2011
Practical Loop Transformations for Tensor Contraction Expressions on Multi-Level Memory Hierarchies
W. Ma, S. Krishnamoorthy, and G. Agrawal
International Conference on Compiler Construction (CC'11), April 2011
Lifeline-based Global Load Balancing
V. Saraswat, P. Kambadur, S. Kodali, D. Grove, and S. Krishnamoorthy
16th ACM SIGPLAN Annual Symposium on Principles and Practices of Parallel Programming (PPoPP'11), February 2011
A Redundant Communication Approach to Scalable Fault Tolerance in PGAS Programming Models
N. Ali, S. Krishnamoorthy, N. Govind, and B. Palmer
19th Euromicro International Conference on Parallel, Distributed and Network-Based Computing, February 2011

2010

Efficient Sparse Matrix-Matrix Multiplication on Heterogeneous High Performance Systems
J. Siegel, O. Villa, S. Krishnamoorthy, A. Tumeo, and X. Li
Workshop on Application/Architecture Co-design for Extreme-scale Computing (AACEC). September 2010
EOMCC, MRPT, and TDDFT Studies of Charge Transfer Processes in Mixed-Valence Compounds: Application to the Spiro Molecule
KR. Glaesemann, N. Govind, S. Krishnamoorthy, and K. Kowalski
Journal of Physical Chemistry A
Acceleration of Streamed Tensor Contraction Expressions on GPGPU-based Clusters
W. Ma, S. Krishnamoorthy, O. Villa, and K. Kowalski
IEEE International Conference on Cluster Computing (CLUSTER). September 2010
Active-space completely-renormalized equation-of-motion coupled-cluster formalism: excited-state studies of green flourescent protein, free-base porphyrin, and oligoporphyrin dimer
K. Kowalski, S. Krishnamoorthy, O. Villa, J. Hammond, and N. Govind
The Journal Of Chemical Physics 132(15)-154103
Load Balancing on Single- and Multi-GPU Systems
L. Chen, O. Villa, S. Krishnamoorthy, and G. Gao
Proceedings of the 24th IEEE International Parallel & Distributed Processing Symposium (IPDPS), April 2010
Selective Recovery From Failures In A Task Parallel Programming Model
J. Dinan, A. Singri, P. Sadayappan, and S. Krishnamoorthy
Proceedings of the The 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing -- Resilience Workshop. May 2010
Scalable communication trace compression
S. Krishnamoorthy and K. Agarwal
Proceedings of the 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CLUSTER). May 2010
High Performance Molecular Dynamic Simulation on Single and Multi-GPU Systems
O. Villa, L. Chen, and S. Krishnamoorthy
Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS) 2010

2009

Performance Optimization of Tensor Contraction Expressions for Many-Body methods in Quantum Chemistry
A. Hartono, Q. Lu, T. Henretty, S. Krishnamoorthy, H. Zhang, G. Baumgartner, D.E. Bernholdt, M. Nooijen, R. Pitzer, J. Ramanujam, and P. Sadayappan
Journal of Physical Chemistry A 113(45), pp.12715-12723
Scalable Work Stealing
J. Dinan, S. Krishnamoorthy, B. Larkins, J. Nieplocha, P. Sadayappan
Supercomputing (SC) 2009, November 2009
Data Layout Transformation for Enhancing Locality on NUCA Chip Multiprocessors
Q. Lu, C. Alias, U. Bondhugula, T. Henretty, S. Krishnamoorthy, J. Ramanujam, A. Rountev, P. Sadayappan, Y. Chen, H. Lin, and T. Ngai
18th International Symposium on Parallel Architectures and Compilation Techniques (PACT-18), September 2009
Parametric multi-level tiling of imperfectly nested loops
A. Hartono, M. Baskaran, C. Bastoul, A. Cohen, S. Krishnamoorthy, B. Norris, J. Ramanujam, P. Sadayappan
ICS 2009: 147-157 , June 2009. BibTeX
An Integrated Approach to Locality-Conscious Processor Allocation and Scheduling of Mixed-Parallel Applications
N. Vydyanathan, S. Krishnamoorthy, G.M. Sabin, U.V. Catalyurek, T.M. Kurc, P. Sadayappan, J.H. Saltz
IEEE Transasctions on Parallel Distributed Systems 20(8): 1158-1172 2009 BibTeX
Scalable transparent checkpoint-restart of global address space applications on virtual machines over infiniband
O. Villa, S. Krishnamoorthy, J. Nieplocha, D.M. Brown Jr.
Conference on Computing Frontiers 2009, April 2009. BibTeX

2008

Global trees: a framework for linked data structures on distributed memory parallel systems
B. Larkins, J. Dinan, S. Krishnamoorthy, S. Parthasarathy, A. Rountev, P. Sadayappan
Supercomputing (SC) 2008, November 2008. BibTeX
Solving large, irregular graph problems using adaptive work-stealing
G. Cong, S. Kodali, S. Krishnamoorthy, D. Lea, V, Saraswat, T. Wen
Proceedings of the International Conference on Parallel Processing (ICPP'08), September 2008. BibTeX
Scioto: a framework for global-view task parallelism
J. Dinan, S. Krishnamoorthy, B. Larkins, J. Nieplocha, and P. Sadayappan
Proceedings of the International Conference on Parallel Processing (ICPP'08), September 2008. BibTeX
A compiler framework for optimization of affine loop nests for GPGPUs
M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan
Proceedings of the International Conference on Supercomputing (ICS'08), June 2008, Island of Kos, Greece. BibTeX
Integrated Data and Task Management for Scientific Applications
J. Nieplocha, S. Krishamoorthy, M. Valiev , M. Krishnan , B. Palmer , and P. Sadayappan
Proceedings of the 8th International Conference on Computational Science (ICCS 2008),June 2008, Krakow, Poland. BibTeX
Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model
Uday Bondhugula, Muthu Manikandan Baskaran, S. Krishnamoorthy, J. Ramanujam, A.Rountev, and P. Sadayappan
Proceedings of the International Conference on Compiler Construction (ETAPS CC'08) April 2008, Budapest, Hungary. BibTeX
Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories
M. Baskaran, Uday Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan.
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'08) February 2008 BibTeX

2007

Efficient search-space pruning for integrated fusion and tiling transformations
X. Gao, S. Krishnamoorthy, S. Sahoo, C. Lam, G. Baumgartner, J. Ramanujam, and P. Sadayappan.
Concurrency and Computation: Practice and Experience, 2007 BibTeX
Non-collective parallel I/O for global address space programming models
S. Krishnamoorthy, J. P. Canovas, V. Tipparaju, J. Nieplocha, and P. Sadayappan.
Procedings of the International Conference on Cluster Computing (CLUSTER 2007). September 2007 BibTeX
Effective automatic parallelization of stencil computations
S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan.
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2007). June 2007 BibTeX

2006

Hypergraph partitioning for automatic memory hierarchy management
S. Krishnamoorthy, U. Catalyurek, J. Nieplocha, and P. Sadayappan.
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2006). November 2006 BibTeX
Design and implementation of a one-sided communication interface for the IBM eserver blue gene supercomputer
Michael Blocksome, Charles Archer, Todd Inglett, Pat McCarthy, Mike Mundy, Joe Ratterman, Albert Sidelnik, Brian Smith, Gheorghe Almasi, Jose Castanos, Derek Lieber, Jose Moreira, Sriram Krishnamoorthy, and Vinod Tipparaju
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2006). November 2006 BibTeX
Locality conscious processor allocation and scheduling for mixed-parallel applications
N. Vydyanathan, S. Krishnamoorthy, G. Sabin, U. Catalyurek, T. Kurc, P. Sadayappan, and J. Saltz.
Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER 2006). September 2006 BibTeX
Combining analytical and empirical approaches in tuning matrix transposition
Q. Lu, S. Krishnamoorthy, and P. Sadayappan.
Proceedings of the 15th International Conference on Parallel Architectures and Compiler Techniques. (PACT 2006) BibTeX
An integrated approach for processor allocation and scheduling of mixed-parallel applications
N. Vydyanathan, S. Krishnamoorthy, G. Sabin, U. Catalyurek, T. Kurc, P. Sadayappan, and J. Saltz.
The 35th International Conference on Parallel Processing (ICPP 2006) BibTeX
Identifying cost-effective common subexpressions to reduce operation count in tensor contraction evaluations
A. Hartono, Q. Lu, X. Gao, S. Krishnamoorthy, M. Nooijen, G. Baumgartner, V. Choppella, D. E. Bernholdt, R. M. Pitzer, J. Ramanujam, A. Rountev, and P. Sadayappan.
The 6th International Conference on Computational Science (ICCS 2006) BibTeX
An approach to locality-conscious load balancing and transparent memory hierarchy management with a global-address-space parallel programming model
S. Krishnamoorthy, U. Catalyurek, J. Nieplocha, and P. Sadayappan.
IPDPS Workshop on Performance Optimization for High-Level Languages and Libraries (POHLL 2006) BibTeX
An extensible global address space framework with decoupled task and data abstractions
S. Krishnamoorthy, U. Catalyurek, J. Nieplocha, A. Rountev, and P. Sadayappan.
IPDPS Workshop on Next Generation Software (NGS 2006). BibTeX
Layout transformation support for the disk resident arrays framework
S. Krishnamoorthy, G. Baumgartner, C. Lam, J. Nieplocha, and P. Sadayappan.
Journal of Supercomputing. vol: 36(2) pp:153-170 May 2006 BibTeX
Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver
Sandhya Krishnan, Sriram Krishnamoorthy, Gerald Baumgartner, Chi-Chung Lam, J. Ramanujam, P. Sadayappan, and Venkatesh Chopella
Journal of Parallel and Distributed Computing (IPDPS Special Issue) vol:66(5) pp:659-673. May 2006 BibTeX
Search-based performance-model driven optimization for compilation of tensor contraction expressions
X. Gao, S. Krishnamoorthy, Q. Lu, V. Choppella, G. Baumgartner, J. Ramanujam, and P. Sadayappan.
The 12th Workshop on Compilers for Parallel Computers (CPC 2006). Coruna, Spain. BibTeX
Task scheduling and file replication for data-intensive jobs with batch-shared i/o
G. Khanna, N. Vydyanathan, U. Catalyurek, T. Kurc, S. Krishnamoorthy, P. Sadayappan, J. Saltz
The 15th IEEE International Symposium on High Performance Distributed Computing (HPDC 2006) BibTeX
Automatic code generation for many-body electronic structure methods: the tensor contraction engine
A. Auer, G. Baumgartner, D. E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, X. Gao, R. Harrison, S. Krishnamoorthy, S. Krishnan, C. Lam, M. Nooijen, R. Pitzer, J. Ramanujam, P. Sadayappan and A. Sibiryakov.
Molecular Physics vol:104(2), pp:211-228. January 2006 BibTeX

2005

Data and computation abstractions for dynamic and irregular computations
S. Krishnamoorthy, J. Nieplocha, P. Sadayappan.
The 12th Annual International Conference on High Performance Computing (HiPC 2005) BibTeX
Integrated loop optimizations for data locality enhancement of tensor contraction expressions
S. K. Sahoo, S. Krishnamoorthy, R. Panuganti, P. Sadayappan.
Supercomputing (SC 2005) BibTeX
Efficient search-space pruning for integrated fusion and tiling transformations
X. Gao, S. Krishnamoorthy, S. K. Sahoo, C. Lam, G. Baumgartner, J. Ramanujam, P. Sadayappan.
The 18th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2005) BibTeX
Locality-aware load balancing for dynamic and irregular computations
S. Krishnamoorthy, P. Sadayappan, J. Nieplocha, and M. Krishnan
Workshop on Patterns in High Performance Computing. May 2005
Cache miss characterization and data locality optimization for imperfectly nested loops on shared memory multiprocessors
S. K. Sahoo, R. Panuganti, S. Krishnamoorthy, P. Sadayappan.
19th IEEE International Parallel & Distributed Processing Symposium. (IPDPS 2005) BibTeX
Synthesis of high-performance parallel programs for a class of ab initio quantum chemistry models
G. Baumgartner, A. Auer, D.E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, X. Gao, R.J. Harrison, S. Hirata, S. Krishnamoorthy, S. Krishnan, C. Lam, Q. Lu, M. Nooijen, R.M. Pitzer, J. Ramanujam, P. Sadayappan and A. Sibiryakov.
Proceedings of the IEEE. vol: 93(2) pp:276-292 February 2005. BibTeX

2004

Layout transformation support for the disk resident arrays framework
S. Krishnamoorthy, G. Baumgartner, C. Lam, J. Nieplocha and P. Sadayappan.
The Los Alamos Computer Science Initiative Symposium. (LACSI 2004) BibTeX
Efficient layout transformation support for disk-based multidimensional arrays
S. Krishnamoorthy, G. Baumgartner, C. Lam, J. Nieplocha and P. Sadayappan.
The 11th Annual International Conference on High Performance Computing. (HiPC 2004) BibTeX
Efficient parallel out-of-core matrix transposition S. Krishnamoorthy, G. Baumgartner, Daniel Cociorva, C. Lam and P. Sadayappan.
International Journal of High Performance Computing and Networking. vol:2(2/3/4) pp:110-119 2004 BibTeX
Empirical performance-model driven data layout optimization
Q. Lu, X. Gao, S. Krishnamoorthy, G. Baumgartner, J. Ramanujam and P. Sadayappan.
The 17th International Workshop on Languages and Compilers for Parallel Computing. (LCPC 2004) BibTeX
Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver Best Paper Award
S. Krishnan, S. Krishnamoorthy, G. Baumgartner, C. Lam, J. Ramanujam, P. Sadayappan and V. Choppella.
The 18th International Parallel & Distributed Processing Symposium. (IPDPS 2004). BibTeX

2003

Data locality optimization for synthesis of efficient out-of-core algorithms Best Paper Award
Sandhya Krishnan, Sriram Krishnamoorthy, G. Baumgartner, D. Cociorva, C. Lam, P. Sadayappan, J. Ramanujam, David E. Bernholdt and V. Choppella.
The 10th Annual International Conference on High Performance Computing. (HiPC 2003). December 2003. BibTeX
Efficient parallel out-of-core matrix transposition
S. Krishnamoorthy, G. Baumgartner, D. Cociorva, C. Lam and P. Sadayappan.
IEEE International Conference on Cluster Computing (CLUSTER 2003). December 2003 BibTeX

Technical Reports

Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories
M. Manikandan Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan
Department of Computer and Information Science, Ohio State University. Technical Report OSU-CISRC-2/08-TR05
Affine transformations for communication minimal parallelization and locality optimization of arbitrarily nested loop sequences
Uday Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan.
Department of Computer and Information Science, Ohio State University. Technical Report OSU-CIRSC-5/07-TR43
A Compiler Framework for Optimization of Affine Loop Nests for General Purpose Computations on GPUs
M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan
Department of Computer and Information Science, Ohio State University. Technical Report OSU-CISRC-12/07-TR78
An integrated approach for processor allocation and scheduling of mixed-parallel applications
N. Vydyanathan, S. Krishnamoorthy, G. Sabin, U. Catalyurek, T. Kurc, P. Sadayappan, and Joel Saltz.
Department of Computer and Information Science, Ohio State University. Technical Report OSU-CIRSC-2/06-TR20
On efficient out-of-core matrix transposition
S. Krishnamoorthy, G. Baumgartner, D. Cociorva, C. Lam and P. Sadayappan.
Department of Computer and Information Science, Ohio State University. Technical Report OSU-CIRSC-9/03-TR52

Invited Papers

Towards effective automatic parallelization for multicore systems
Uday Bondhugula, Muthu Baskaran, Albert Hartono, Sriram Krishnamoorthy, J. Ramanujam, Atanas Rountev and P. Sadayappan
Proceedings of the IPDPS Workshop on Next Generation Software (NSF-NGS 2008). April 2008 BibTeX
A global adress space framework for locality aware scheduling of block-sparse computations
S. krishnamoorthy, U. Catalyurek, J. Nieplocha, A. Rountev, and P. Sadayappan.
Proceedings of the IPDPS Workshop on Next Generation Software (NSF-NGS 2007). April 2007 BibTeX

Posters

Scalable Fault Tolerance in PGAS Programming Models
Nawab Ali, Sriram Krishnamoorthy, Niranjan Govind, Bruce Palmer, and Oreste Villa
Supercomputing 2010. November 2010
Parallel global address space framework with multiple inter-operable abstractions
Sriram Krishnamoorthy, Brian Larkins, Atanas Rountev, P. Sadayappan, Jarek Nieplocha, and Robert J. Harrison
The second conference on Partitioned Global Address Space Programming Models(PGAS 2006). October 2006
Web service pipelining
New Melchizedec, S. Krishnamoorthy, Vimal Kumar Vivekananthamoorthy, and Arul Siromoney.
The 8th Annual International Conference on High Performance Computing. (HiPC 2001). December 2001

[ Home ] [ Research ] [ Publications ] [ CV ] [ Links ]

Under construction. Last modified: 5-October-2020