This talk discusses numerical aspects of petascale computing for PDEs, especially in a Finite Element context. Sparse matrices as they arise in FEM discretizations typically perform more than an order of magnitude below peak performance serially and for a single node. We propose the metric ``Total Numerical Efficiency`` to measure local and global performance, and discuss ScaRC, a hierarchical data and solver structure as generalization of multigrid/domain decomposition approaches, and FEAST, a software project to achieve efficiency and flexibility, with respect to (1) optimal O(N) numerical solvers (multigrid), (2) numerical convergence rates, (3) close-to-peak single node performance and (4) good scalability in parallel, hereby including adaptive meshing techniques of h-r-p type. We review results for generalized tensorproduct meshes embedded in a globally unstructured macro mesh, and evaluate performance on superscalar (and clusters thereof) and vector machines. To improve single-node performance, we discuss recent advances in using GPUs and FPGAs as co-processors to improve on performance, and evaluate algorithms to improve precision and accuracy of these natively low-precision architectures in the FEM context.