In Computational Science and in particular in the numerical simulation of PDE problems,
optimal serial performance is essential for a successful scale-out to the tera- and
petascale dimensions. In this paper, we propose a simple yet fundamental benchmark
setting for a PDE problem that we believe any reasonably flexible Finite Element based
software should be able to handle effortlessly. The Poisson problem used in these tests
allows reliable performance estimates for more challenging simulations.
Our performance evaluation focuses on numerical methodology and data layouts rather
than implementational fine-tuning. To enable a fair and realistic comparison independent
of the underlying numerical methodology, we define the metric total efficiency. Results are
presented for two different solver classes, multigrid and Krylov-subspacemethods, obtained
in single-core computations with our solver packages Feat2 and Feast. We quantitatively
emphasise the effect of different storage techniques and numbering (reordering) schemes,
which constitute the crucial factor in view of the memory wall problem that ultimately
determines performance of all Finite Element codes. We demonstrate a speed-up of more
than a factor 1000 by migrating from a naive implementation of a standard Krylov solver
to a sophisticated implementation of an advanced multigrid solver, without applying any
adaptivity.