POLOK Lukáš. Accelerated Sparse Matrix Operations in Nonlinear Least Squares Solvers. Brno: Department of Computer Graphics and Multimedia FIT BUT, 2017. 
Publication language:  english 

Original title:  Accelerated Sparse Matrix Operations in Nonlinear Least Squares Solvers 

Title (cs):  Akcelerace operací nad řídkými maticemi v nelineární metodě nejmenších čtverců 

Pages:  1241 

Place:  Brno, CZ 

Year:  2017 

Publisher:  Department of Computer Graphics and Multimedia FIT BUT 

Files:  


Keywords 

Nonlinear least squares; numerical methods; sparse block matrix; general purpose computations on graphics processing units. 
Annotation 

This thesis focuses on data structures for sparse block matrices and the associated algorithms for performing linear algebra operations that I have developed. Sparse block matrices occur naturally in many key problems, such as Nonlinear Least Squares (NLS) on graphical models. NLS are used by e.g. Simultaneous Localization and Mapping (SLAM) in robotics, Bundle Adjustment (BA) or Structure from Motion (SfM) in computer vision. Sparse block matrices also occur when solving Finite Element Methods (FEMs) or Partial Differential Equations (PDEs) in physics simulations.
The majority of the existing state of the art sparse linear algebra implementations use elementwise sparse matrices and only a small fraction of them support sparse block matrices. This is perhaps due to the complexity of sparse block formats which reduces computational efficiency, unless the blocks are very large. Some of the more specialized solvers in robotics and computer vision use sparse block matrices internally to reduce sparse matrix assembly costs, but finally end up converting such representation to an elementwise sparse matrix for the linear solver.
Most of the existing sparse block matrix implementations focus only on a single operation, such as the matrixvector product. The solution proposed in this thesis covers a broad range of functions: it includes efficient sparse block matrix assembly, matrixvector and matrixmatrix products as well as triangular solving and Cholesky factorization. These operations can be used to construct both direct and iterative solvers as well as to compute eigenvalues. Highly efficient algorithms for both Central Processing Units (CPUs) and Graphics Processing Units (GPUs) are provided.
The proposed solution is integrated in SLAM++, a nonlinear least squares solver focused on robotics and computer vision. It is evaluated on standard datasets where it proves to significantly outperform other similar state of the art implementations, without sacrificing generality or accuracy in any way. 