The Blue Waters supercomputer being built by IBM is a game changer, especially in the area of bandwidth. It has 2+ million threads, 2PB of memory with 8PB/s of aggregate memory bandwidth, and over 32,000 2nd generation x16 PCIe slots with 640TB/s aggregate bandwidth.

I based this information on a article written by Timothy Prickett Morgan at The Register, and a little math about how the bandwidth in the switch/hub chip is dedicated. I'm sure there will be more to say when the system is actually installed in 2011.

Communication Bandwidth
per Switch/Hub Chip per drawer per supernode per Blue Waters (512 nodes)
192 GB/s of bandwidth into each Power7 MCM (what IBM called a host connection)
336 GB/s of connectivity to the seven other local nodes(MCMs) on the drawer
240 GB/s of bandwidth between the nodes in a four-drawer supernode

1920GB/s

320 GB/s dedicated to linking nodes to remote nodes 2560GB/s 10240GB/s  (10TB/s)
total external inter-node (not including PCIe cards) 4480GB/s
40 GB/s of general purpose I/O bandwidth (PCIe) 320GB/s 1280GB/s 655360GB/s (640TB/s)

Empty spaces in the chart are values which shouldn't be aggregated

Thread Count
4 SMT threads/core
8 cores/chip (32 threads)
4 chips/MCM (128 threads)
8 MCMs/drawer (1024 threads)
4 drawers/supernode (4096 threads)
512 supernode (2,097,152 threads)

Memory Bandwidth
128GB/s per chip
512GB/s per MCM
4TB/s per drawer
16TB/s per supernode
8PB/s
DIMM Count and Memory Size

Capacity DIMMs
per DIMM 8GB 1
per MCM 128GB 16
per drawer 1TB 128
per supernode 4TB 512
per Blue Waters 2PB 262144

PCIe
16 x16 PCIe2 per drawer (64 per supernode, 32,768 total)
1 x8 PCIe2 per drawer (4 per supernode, 2,048 total)