We develop a hybrid multiGPU and CPU version of an algorithm to model seismic wave propagation based on the spectral-element method. We implement an open-source high-order finite-element application, called SPECFEM3D that performs the numerical simulation of seismic wave propagation resulting for instance from earthquakes at the scale of a continent or from active seismic acquisition experiments in the oil industry, on a large cluster of NVIDIA graphics cards using the CUDA programming environment and non-blocking message passing based on MPI. This allows users to handle large numerical grids and simulate a large number of time steps for each geophysical model under study. Contrary to many other numerical techniques, ours is implemented successfully in single precision, maximizing the performance of current generation GPUs. Our GPU code can handle models of the Earth containing both fluid and solid layers (whic h is the case for instance at the scale of the full Earth, whose outer core is fluid). We will discuss the implementation and optimization of the code and compare it to an existing very optimized implementation in C language and MPI on a classical cluster of CPU nodes. We remove dependencies between neighboring mesh elements, which cannot easily be handled in parallel, based upon a mesh coloring technique to create subsets of independent elements. Thus, we efficiently handle summation operations over degrees of freedom on an unstructured mesh. Non-blocking MPI messages allow overlap between communications across the network and the data transfer to and from the device via the PCI-Express bus with calculations on the GPU. We perform a number of numerical tests to validate the single-precision CUDA and MPI implementation and assess its accuracy. We then analyze performance measurements and depending on how the problem is mapped to the reference CPU cluster, we obtain a speedup of 20x or 12x. Thanks to the overlapping of communications and computation, we obtain excellent weak scalability of this finite-element code on a cluster of 192 GPUs.