Applications of high-performance graph analysis range from computational biology to network security and even transportation. These applications often consider graphs under rapid change and are moving beyond HPC platforms into energy-constrained embedded systems. This paper optimizes one successful and demanding analysis kernel, betweenness centrality, for NVIDIA GPU accelerators in both environments. Our algorithm for static analysis is capable of exceeding 2 million traversed edges per second per watt (MTEPS/W). Optimizing the parallel algorithm and treating the dynamic problem directly achieves a 6.9× average speed-up and 83% average reduction in energy consumption.