Algorithm1032: Bi-cubic Splines for Polyhedral Control NetsPeters, Jörg; Lo, Kyle; Karčiauskas, Kȩstutis
doi: 10.1145/3570158pmid: N/A
For control nets outlining a large class of topological polyhedra, not just tensor-product grids, bi-cubic polyhedral splines form a piecewise polynomial, first-order differentiable space that associates one function with each vertex. Akin to tensor-product splines, the resulting smooth surface approximates the polyhedron. Admissible polyhedral control nets consist of quadrilateral faces in a grid-like layout, star-configuration where n ≠ 4 quadrilateral faces join around an interior vertex, n-gon configurations, where 2n quadrilaterals surround an n-gon, polar configurations where a cone of n triangles meeting at a vertex is surrounded by a ribbon of n quadrilaterals, and three types of T-junctions where two quad-strips merge into one. The bi-cubic pieces of a polyhedral spline have matching derivatives along their break lines, possibly after a known change of variables. The pieces are represented in Bernstein-Bézier form with coefficients depending linearly on the polyhedral control net, so that evaluation, differentiation, integration, moments, and so on, are no more costly than for standard tensor-product splines. Bi-cubic polyhedral splines can be used both to model geometry and for computing functions on the geometry. Although polyhedral splines do not offer nested refinement by refinement of the control net, polyhedral splines support engineering analysis of curved smooth objects. Coarse nets typically suffice since the splines efficiently model curved features. Algorithm 1032 is a C++ library with input-output example pairs and an IGES output choice.
A Geometric Multigrid Method for Space-Time Finite Element Discretizations of the NavierStokes Equations and its Application to 3D Flow SimulationAnselmann, Mathias; Bause, Markus
doi: 10.1145/3582492pmid: N/A
We present a parallelized geometric multigrid (GMG) method, based on the cell-based Vanka smoother, for higher order space-time finite element methods (STFEM) to the incompressible Navier–Stokes equations. The STFEM is implemented as a time marching scheme. The GMG solver is applied as a preconditioner for generalized minimal residual iterations. Its performance properties are demonstrated for 2D and 3D benchmarks of flow around a cylinder. The key ingredients of the GMG approach are the construction of the local Vanka smoother over all degrees of freedom in time of the respective subinterval and its efficient application. For this, data structures that store pre-computed cell inverses of the Jacobian for all hierarchical levels and require only a reasonable amount of memory overhead are generated. The GMG method is built for the deal.II finite element library. The concepts are flexible and can be transferred to similar software platforms.
Algorithm1033: Parallel Implementations for Computing the Minimum Distance of a Random Linear Code on Distributed-memory ArchitecturesQuintana-Ortí, Gregorio; Hernando, Fernando; Igual, Francisco D.
doi: 10.1145/3573383pmid: N/A
The minimum distance of a linear code is a key concept in information theory. Therefore, the time required by its computation is very important to many problems in this area. In this article, we introduce a family of implementations of the Brouwer–Zimmermann algorithm for distributed-memory architectures for computing the minimum distance of a random linear code over 𝔽2. Both current commercial and public-domain software only work on either unicore architectures or shared-memory architectures, which are limited in the number of cores/processors employed in the computation. Our implementations focus on distributed-memory architectures, thus being able to employ hundreds or even thousands of cores in the computation of the minimum distance. Our experimental results show that our implementations are much faster, even up to several orders of magnitude, than current implementations widely used nowadays.
Robust Topological Construction of All-hexahedral Boundary Layer MeshesReberol, Maxence; Verhetsel, Kilian; Henrotte, François; Bommes, David; Remacle, Jean-François
doi: 10.1145/3577196pmid: N/A
We present a robust technique to build a topologically optimal all-hexahedral layer on the boundary of a model with arbitrarily complex ridges and corners. The generated boundary layer mesh strictly respects the geometry of the input surface mesh, and it is optimal in the sense that the hexahedral valences of the boundary edges are as close as possible to their ideal values (local dihedral angle divided by 90°). Starting from a valid watertight surface mesh (all-quad in practice), we build a global optimization integer programming problem to minimize the mismatch between the hexahedral valences of the boundary edges and their ideal values. The formulation of the integer programming problem relies on the duality between boundary hexahedral configurations and triangulations of the disk, which we reframe in terms of integer constraints. The global problem is solved efficiently by performing combinatorial branch-and-bound searches on a series of sub-problems defined in the vicinity of complicated ridges/corners, where the local mesh topology is necessarily irregular because of the inherent constraints in hexahedral meshes. From the integer solution, we build the topology of the all-hexahedral layer, and the mesh geometry is computed by untangling/smoothing. Our approach is fully automated, topologically robust, and fast.
Event-Based Automatic Differentiation of OpenMP with OpDiLibBlühdorn, Johannes; Sagebaum, Max; Gauger, Nicolas
doi: 10.1145/3570159pmid: N/A
We present the new software OpDiLib, a universal add-on for classical operator overloading AD tools that enables the automatic differentiation (AD) of OpenMP parallelized code. With it, we establish support for OpenMP features in a reverse mode operator overloading AD tool to an extent that was previously only reported on in source transformation tools. We achieve this with an event-based implementation ansatz that is unprecedented in AD. Combined with modern OpenMP features around OMPT, we demonstrate how it can be used to achieve differentiation without any additional modifications of the source code; neither do we impose a priori restrictions on the data access patterns, which makes OpDiLib highly applicable. For further performance optimizations, restrictions like atomic updates on adjoint variables can be lifted in a fine-grained manner. OpDiLib can also be applied in a semi-automatic fashion via a macro interface, which supports compilers that do not implement OMPT. We demonstrate the applicability of OpDiLib for a pure operator overloading approach in a hybrid parallel environment. We quantify the cost of atomic updates on adjoint variables and showcase the speedup and scaling that can be achieved with the different configurations of OpDiLib in both the forward and the reverse pass.
Algorithm1034: An Accelerated Algorithmto Compute the Qn Robust Statistic, with Corrections to ConstantsFahmy, Thierry
doi: 10.1145/3576920pmid: N/A
The robust scale estimator Qn developed by Croux and Rousseeuw [3], for the computation of which they provided a deterministic algorithm, has proven to be very useful in several domains including in quality management and time series analysis. It has interesting mathematical (50% breakdown, 82% Asymptotic Relative Efficiency) and computing (O(nlogn) time, O(n) space) properties. While working on a faster algorithm to compute Qn, we have discovered an error in the computation of the d constant, and as a consequence in the dn constants that are used to scale the statistic for consistency with the variance of a normal sample. These errors have been reproduced in several articles including in the International Standard Organisation 13,528 [12] document. In this article, we fix the errors and present a new approach, which includes a new algorithm, allowing computations to run 1.3 to 4.5 times faster when n grows from 10 to 100,000.
Certifying Zeros of Polynomial Systems Using Interval ArithmeticBreiding, Paul; Rose, Kemal; Timme, Sascha
doi: 10.1145/3580277pmid: N/A
We establish interval arithmetic as a practical tool for certification in numerical algebraic geometry. Our software HomotopyContinuation.jl now has a built-in function certify, which proves the correctness of an isolated nonsingular solution to a square system of polynomial equations. The implementation rests on Krawczyk’s method. We demonstrate that it dramatically outperforms earlier approaches to certification. We see this contribution as a powerful new tool in numerical algebraic geometry, which can make certification the default and not just an option.
Combining Sparse Approximate Factorizations with Mixed-precision Iterative RefinementAmestoy, Patrick; Buttari, Alfredo; Higham, Nicholas J.; L’Excellent, Jean-Yves; Mary, Theo; Vieublé, Bastien
doi: 10.1145/3582493pmid: N/A
The standard LU factorization-based solution process for linear systems can be enhanced in speed or accuracy by employing mixed-precision iterative refinement. Most recent work has focused on dense systems. We investigate the potential of mixed-precision iterative refinement to enhance methods for sparse systems based on approximate sparse factorizations. In doing so, we first develop a new error analysis for LU- and GMRES-based iterative refinement under a general model of LU factorization that accounts for the approximation methods typically used by modern sparse solvers, such as low-rank approximations or relaxed pivoting strategies. We then provide a detailed performance analysis of both the execution time and memory consumption of different algorithms, based on a selected set of iterative refinement variants and approximate sparse factorizations. Our performance study uses the multifrontal solver MUMPS, which can exploit block low-rank factorization and static pivoting. We evaluate the performance of the algorithms on large, sparse problems coming from a variety of real-life and industrial applications showing that mixed-precision iterative refinement combined with approximate sparse factorization can lead to considerable reductions of both the time and memory consumption.
Newly Released Capabilities in the Distributed-Memory SuperLU Sparse Direct SolverLi, Xiaoye S.; Lin, Paul; Liu, Yang; Sao, Piyush
doi: 10.1145/3577197pmid: N/A
We present the new features available in the recent release of SuperLU_DIST, Version 8.1.1. SuperLU_DIST is a distributed-memory parallel sparse direct solver. The new features include (1) a 3D communication-avoiding algorithm framework that trades off inter-process communication for selective memory duplication, (2) multi-GPU support for both NVIDIA GPUs and AMD GPUs, and (3) mixed-precision routines that perform single-precision LU factorization and double-precision iterative refinement. Apart from the algorithm improvements, we also modernized the software build system to use CMake and Spack package installation tools to simplify the installation procedure. Throughout the article, we describe in detail the pertinent performance-sensitive parameters associated with each new algorithmic feature, show how they are exposed to the users, and give general guidance of how to set these parameters. We illustrate that the solver’s performance both in time and memory can be greatly improved after systematic tuning of the parameters, depending on the input sparse matrix and underlying hardware.