The QsNetII network has been designed to optimize the interprocessor communication performance in systems constructed from standard server building blocks. In order to achieve this, the network interface incorporates a number of innovative features, to minimize latency for short messages, and achieve the maximum bandwidth from a standard PCI-X interface. The network interface has a full 64 bit virtual addressing capability and can perform RDMA operations from user space to user space in 64 bit architectures. An embedded I/O processor, which is user programmable, can be used to offload asynchronous protocol handling tasks. The resulting system offers the highest MPI performance available on systems based on standard processing nodes. The experimental results show that QsNetII can deliver an MPI latency of 1.38 microseconds, remote DMA latency under a microsecond, almost optimal point-to-point bandwidth and high scalability with collective communication.