We present different kernels based on Lattice-Boltzmann methods for the solution of the two-dimensional Shallow Water and Navier-Stokes equations on fully structured lattices. The functionality ranges from simple scenarios like open- channel flows with planar beds to simulations with complex scene geometries like solid obstacles and non-planar bed topography with dry-states and even interaction of the fluid with floating objects. The kernels are integrated into a hardware-oriented collection of libraries targeting multiple fundamentally different parallel hardware architectures like commodity multicore CPUs, the Cell BE, NVIDIA GPUs and clusters. We provide an algorithmic study which compares the different solvers in terms of performance and numerical accuracy in view of their capabilities and their specific implementation and optimisation on the different architectures. We show that an eightfold speedup over optimised multithreaded CPU code can be obtained with the GPU using basic methods and that even very complex flow phenomena can be simulated with significant speedups without loss of accuracy.