In this paper, we present a GPU-based sorting algorithm, GPUMemSort, which achieves high performance in sorting large-scale in-memory data by exploiting high-parallel GPU processors. It consists of two algorithms: an in-core algorithm, which is responsible for sorting data in GPU global memory efficiently, and an out-of-core algorithm, which is responsible for dividing large-scale data into multiple chunks that fit GPU global memory. GPUMemSort is implemented based on NVIDIA’s CUDA framework, and some critical and detailed optimization methods are also presented. The tests of different algorithms have been run on multiple data sets. The experimental results show that our in-core sorting can outperform other comparison-based algorithms and GPUMemSort is highly effective in sorting large-scale in-memory data.