While we are still witnessing Moore’s Law by the steady production of chips that mass billions of transistors, clearly we have reached plateaus on clock frequency, power, and single stream performance. This new era has caused a rethinking of microprocessor design in search of innovations that will allow the continued performance improvement of scientific applications at an exponential rate. One technology that holds promise combines traditional microprocessors with special-purpose, very high performance, low-power chips such as the IBM Cell Broadband Engine, Graphics processors, and FPGAs, to accelerate the performance of computational science and engineering applications. The use of these chip accelerators will likely be a path forward, yet new challenges await such as system-level design, partitioning applications to accelerators, and tools for designing applications. The Sony-Toshiba-IBM Cell Broadband Engine is a heterogeneous multicore architecture that consists of a traditional microprocessor (PPE), with eight SIMD coprocessing units (SPEs) integrated on-chip. Because of the performance capabilities of the Cell BE, it is considered as an application accelerator for next-generation petascale supercomputers. Another promising technology, the Cray XMT - a massive latency-tolerent multithreaded architectures - accelerates performance on applications that use massive-scale data analytics. The XMT employs fine-grained threads to tolerate latency for irregular applications that are often challenging to parallelize on traditional cache-based architectures