In this paper we describe how we improved the effective performance of ASCI Q, the world's second-fastest supercomputer, to meet our expectations.Using an arsenal of performance-analysis techniques including analytical models, custom microbenchmarks, full applications, and simulators, we succeeded in observing a serious --but previously undetected-- performance problem.We identified the source of the problem, eliminated the problem, and ``closed the loop'' by demonstrating up to a factor of 2 improvement in application performance.We present our methodology and provide insight into performance analysis that is immediately applicable to other large-scale supercomputers.