This paper explores the coupling of coarse and fine-grained parallelism for Finite Element (FE) simulations based on efficient parallel multigrid solvers. The focus lies on both system performance and a minimally invasive integration of hardware acceleration into an existing software package, requiring no changes to application code. Because of their excellent price performance ratio, we demonstrate the viability of our approach by using commodity Graphics Processing Units (GPUs), addressing the issue of limited precision on GPUs by applying a mixed precision, iterative refinement technique. Our results show that we do not compromise any software functionality and gain speedups of two and more for large problems.