Scientific codes spend a considerable part of their run time executing collective communication operations. Such operations can also be critical for efficient resource management in large-scale machines. Therefore, scalable collective communication is a key factor to achieve good performance in large-scale parallel computers. In this paper we describe the performance and scalability of some common collective communication patterns on the ASCI Q machine. Experimental results conducted on a 1024-node/4096-processor segment show that the network is fast and scalable.The network is able to barrier-synchronize in a few tens of microseconds, perform a broadcast with an aggregate bandwidth of more than 100 GB/second and sustain heavy hot-spot traffic with a limited performance degradation.