In this paper, we introduced a three-stage load-balancing switch and studied its performance via theoretical analysis and experimental simulation. This effectively solves the mis-sequencing problem without the need of costly online scheduling algorithms or hardware speedup. The most significant difference between the 3SLB switch and prior load-balancing switches is the third stage of the 3SLB switch in which packets are buffered in an output load-balancing fashion and forwarded in order of ascending arrival time. With the third stage, the order of packets is preserved without using any complex real-time scheduling algorithm; The complexity of the online scheduling algorithm inside a 3LSB switch is , and no hardware speedup is required. The transmission delay of the 3SLB switch is bounded by that of an OQ switch plus a constant that depends only on the number of input/output ports. The 3SLB switch achieves the same throughput as an OQ switch for arbitrary input traffic patterns. By simulation, we show that the 3SLB switch has a much lower average delay than existing IQ switches under heavy inputs as well as bursty inputs