Benchmarks¶
All the following benchmarks have been carried out on an i7-8750H(with OpenMP enabled, this is 12 threads), with Intel’s icpc (ICC) 19.0.1.144 compiler and Eigen version 3.3.7, with DTYPE
set to double
. The compiler flags that were utilized are the same are those mentioned in the CMakeLists.txt
file.
Presented below are the results as obtained when using different kernels:
Gaussian Kernel¶
The Gaussian Kernel is given by \(K(i, j) = \sigma^2 \delta_{ij}^2 + \exp(-||x_i - x_j||^2)\). For these benchmarks, we take \(\sigma = 10\) with \(x\) being set as a sorted random vector \(\in (-1, 1)\). Using the plotTree
function of this library, we can look at the rank structure for this matrix. The following diagram is obtained with \(N = 10000\), \(M = 500\) and tolerance \(10^{-12}\)
The green blocks are low-rank blocks. Their intensity of colour shows their degree of “low-rankness”. Additionally, the rank has been displayed in each of these blocks. The red blocks are full-rank blocks and would have the rank of \(M = 500\)
Time Taken vs Tolerance¶
These benchmarks were performed for size of the matrix \(N = 1000000\), with the size of the leaf node set to \(M = 100\).
Fast Factorization¶
Tolerance | Assembly(s) | MatVec(s) | Factorize(s) | Solve(s) | Determinant(s) |
---|---|---|---|---|---|
\(10^{-2}\) | 1.65059 | 11.4442 | 1.47112 | 0.321805 | 0.0300901 |
\(10^{-4}\) | 1.82825 | 8.33693 | 1.94887 | 0.337377 | 0.03039 |
\(10^{-6}\) | 1.92681 | 12.4077 | 2.33157 | 0.346198 | 0.0300648 |
\(10^{-8}\) | 2.09475 | 11.5901 | 2.74718 | 0.361579 | 0.0338411 |
\(10^{-10}\) | 2.28711 | 11.8123 | 3.22611 | 0.375279 | 0.0296249 |
\(10^{-12}\) | 2.54764 | 11.2157 | 3.89779 | 0.398319 | 0.0305111 |
\(10^{-14}\) | 2.95124 | 8.55489 | 5.01199 | 0.431082 | 0.0309851 |

Fast Symmetric Factorization¶
Tolerance | Assembly(s) | MatVec(s) | Factorize(s) | Solve(s) | Determinant(s) | MultSymFactor(s) |
---|---|---|---|---|---|---|
\(10^{-2}\) | 1.61076 | 11.4741 | 1.17726 | 0.387992 | 0.0366111 | 0.226509 |
\(10^{-4}\) | 1.81511 | 8.08747 | 1.56692 | 0.399085 | 0.0328679 | 0.249969 |
\(10^{-6}\) | 1.91956 | 12.3341 | 1.83361 | 0.418334 | 0.031215 | 0.266352 |
\(10^{-8}\) | 2.07343 | 11.2653 | 2.29376 | 0.440591 | 0.0327439 | 0.288697 |
\(10^{-10}\) | 2.24097 | 11.7877 | 2.86431 | 0.464687 | 0.0305729 | 0.339399 |
\(10^{-12}\) | 2.57603 | 11.3027 | 3.66516 | 0.494536 | 0.031522 | 0.393104 |
\(10^{-14}\) | 2.90611 | 7.89484 | 4.93738 | 0.537149 | 0.030225 | 0.393285 |

Time Taken vs Size of Matrix¶
For these benchmarks, the leaf size was fixed at \(M = 100\), with tolerance set to \(10^{-12}\)
Fast Factorization¶
\(N\) | Assembly(s) | MatVec(s) | Factorize(s) | Solve(s) | Determinant(s) | Direct LU(s) |
---|---|---|---|---|---|---|
\(10^{3}\) | 0.00345016 | 0.000463963 | 0.00121403 | 0.000246048 | 2.09808e-05 | 0.024302 |
\(5 \times 10^{3}\) | 0.00954294 | 0.000818014 | 0.00755906 | 0.00179601 | 0.000159979 | 1.61282 |
\(10^{4}\) | 0.0180159 | 0.00202203 | 0.103507 | 0.003834 | 0.000344992 | 10.4102 |
\(5 \times 10^{4}\) | 0.109816 | 0.0147851 | 0.103266 | 0.022316 | 0.00227404 | N/A |
\(10^{5}\) | 0.202525 | 0.066885 | 0.239639 | 0.0450559 | 0.00451112 | N/A |
\(5 \times 10^{5}\) | 1.19365 | 3.68382 | 1.6615 | 0.206754 | 0.015748 | N/A |
\(10^{6}\) | 2.53519 | 11.1435 | 3.93549 | 0.399695 | 0.0303771 | N/A |

Fast Symmetric Factorization¶
\(N\) | Assembly(s) | MatVec(s) | Factorize(s) | Solve(s) | Determinant(s) | MultSymFactor(s) | Direct Cholesky(s) |
---|---|---|---|---|---|---|---|
\(10^{3}\) | 0.00344396 | 0.000510931 | 0.00103807 | 0.00030303 | 2.19345e-05 | 0.000180006 | 0.0316679 |
\(5 \times 10^{3}\) | 0.00925708 | 0.000812054 | 0.00626493 | 0.00209403 | 0.000108004 | 0.00113392 | 2.35399 |
\(10^{4}\) | 0.0183232 | 0.00199389 | 0.010865 | 0.00471711 | 0.000352859 | 0.00263691 | 18.5745 |
\(5 \times 10^{4}\) | 0.0946209 | 0.0151899 | 0.0787759 | 0.0285201 | 0.00230503 | 0.0157571 | N/A |
\(10^{5}\) | 0.203769 | 0.0659761 | 0.183974 | 0.058074 | 0.00438595 | 0.03263 | N/A |
\(5 \times 10^{5}\) | 1.18639 | 3.67825 | 1.47418 | 0.245743 | 0.0180571 | 0.162066 | N/A |
\(10^{6}\) | 2.53567 | 11.2973 | 3.56786 | 0.488049 | 0.0311899 | 0.377352 | N/A |

Matérn Kernel¶
Kernel considered is given by \(K(r) = \sigma^2 \left(1 + \frac{r \sqrt{5}}{\rho} + \frac{5 r^2}{3 \rho^2}\right)\exp{\left(-\frac{r \sqrt{5}}{\rho}\right)}\). For these benchmarks, we take \(\sigma = 10\), \(\rho = 5\), where \(r = ||x_i - x_j||\) with \(x\) being set as a sorted random vector \(\in (-1, 1)\). Using plotTree
for \(N = 10000\), \(M = 500\) and tolerance \(10^{-12}\), we see this rank structure
Time Taken vs Tolerance¶
These benchmarks were performed for size of the matrix \(N = 1000000\), with the size of the leaf node set to \(M = 100\).
Fast Factorization¶
Tolerance | Assembly(s) | MatVec(s) | Factorize(s) | Solve(s) | Determinant(s) |
---|---|---|---|---|---|
\(10^{-2}\) | 1.70237 | 13.8247 | 1.3231 | 0.388983 | 0.042177 |
\(10^{-4}\) | 1.93746 | 14.0274 | 1.37327 | 0.401342 | 0.0430369 |
\(10^{-6}\) | 1.99264 | 9.29146 | 1.6509 | 0.413971 | 0.0420959 |
\(10^{-8}\) | 2.04502 | 13.6249 | 1.80135 | 0.417019 | 0.043962 |
\(10^{-10}\) | 2.08538 | 14.7541 | 2.1616 | 0.455189 | 0.0420899 |
\(10^{-12}\) | 2.28954 | 9.11655 | 2.27049 | 0.431815 | 0.043808 |
\(10^{-14}\) | 2.19898 | 13.821 | 2.74798 | 0.466761 | 0.0431418 |

Fast Symmetric Factorization¶
Tolerance | Assembly(s) | MatVec(s) | Factorize(s) | Solve(s) | Determinant(s) | MultSymFactor(s) |
---|---|---|---|---|---|---|
\(10^{-2}\) | 1.65146 | 13.4722 | 0.722689 | 0.461396 | 0.0417881 | 0.257583 |
\(10^{-4}\) | 1.87788 | 13.6014 | 0.778202 | 0.471056 | 0.041806 | 0.263908 |
\(10^{-6}\) | 1.93905 | 8.81335 | 0.836078 | 0.478072 | 0.0427818 | 0.268437 |
\(10^{-8}\) | 2.05821 | 13.4592 | 1.05975 | 0.496589 | 0.0437939 | 0.294927 |
\(10^{-10}\) | 2.0032 | 14.3409 | 1.31922 | 0.507549 | 0.0424139 | 0.296023 |
\(10^{-12}\) | 2.23442 | 8.84984 | 1.51609 | 0.533495 | 0.0427949 | 0.311331 |
\(10^{-14}\) | 2.18632 | 13.6219 | 1.95092 | 0.551657 | 0.0439069 | 0.342182 |

Time Taken vs Size of Matrix¶
For these benchmarks, the leaf size was fixed at \(M = 100\), with tolerance set to \(10^{-12}\)
Fast Factorization¶
\(N\) | Assembly(s) | MatVec(s) | Factorize(s) | Solve(s) | Determinant(s) | Direct LU(s) |
---|---|---|---|---|---|---|
\(10^{3}\) | 0.00927687 | 0.0001921 | 0.0011642 | 0.000297 | 3.19481e-05 | 0.0489709 |
\(5 \times 10^{3}\) | 0.0159199 | 0.0007879 | 0.00726509 | 0.002069 | 0.000204086 | 2.52755 |
\(10^{4}\) | 0.026196 | 0.0020630 | 0.0235729 | 0.005370 | 0.000522852 | 16.0086 |
\(5 \times 10^{4}\) | 0.098814 | 0.0144801 | 0.106045 | 0.027053 | 0.00375605 | |
\(10^{5}\) | 0.180091 | 0.0756569 | 0.19264 | 0.054170 | 0.00687695 | |
\(5 \times 10^{5}\) | 1.10963 | 3.33762 | 0.943877 | 0.234129 | 0.0219009 | |
\(10^{6}\) | 2.25833 | 9.01339 | 2.33021 | 0.450053 | 0.041976 |

Fast Symmetric Factorization¶
\(N\) | Assembly(s) | MatVec(s) | Factorize(s) | Solve(s) | Determinant(s) | MultSymFactor(s) | Direct Cholesky(s) |
---|---|---|---|---|---|---|---|
\(10^{3}\) | 0.0066328 | 0.000208855 | 0.000833988 | 0.00034499 | 2.81334e-05 | 0.000160933 | 0.0281229 |
\(5 \times 10^{3}\) | 0.0103149 | 0.000798941 | 0.00359011 | 0.00228715 | 0.000156879 | 0.00105405 | 0.231569 |
\(10^{4}\) | 0.02724 | 0.00200987 | 0.0175741 | 0.00552893 | 0.000396013 | 0.00261402 | 1.05882 |
\(5 \times 10^{4}\) | 0.08972 | 0.0151231 | 0.044107 | 0.034517 | 0.00314713 | 0.0162551 | N/A |
\(10^{5}\) | 0.192696 | 0.067266 | 0.0933969 | 0.0709021 | 0.0061872 | 0.0332701 | N/A |
\(5 \times 10^{5}\) | 1.09055 | 3.2381 | 0.612783 | 0.263855 | 0.024405 | 0.151778 | N/A |
\(10^{6}\) | 2.19711 | 8.79683 | 1.47177 | 0.545244 | 0.0434139 | 0.310443 | N/A |

RPY Tensor¶
The RPY Tensor is given by
where \(r = ||\textbf{r}_i - \textbf{r}_j||\) with \(\textbf{r}\) being set as a sorted random matrix \(\in (-1, 1)\) with the number of columns set equal to the dimension considered. For these benchmarks, we take \(k_B = T = \eta = 1\). For \(a\), we find the minimum of the interaction distances between all particles \(r_{min}\) and set \(a = \frac{r_{min}}{2}\). This means that for the considered case, the RPY tensor simplifies to:
We have used plotTree
to reveal the rank structure for the problems below when considering matrix size \(N = 10000\), leaf size \(M = 500\) and tolerance \(10^{-12}\).
\(\texttt{dim} = 1\)¶
Time Taken vs Size of Matrix¶
For these benchmarks, the leaf size was fixed at \(M = 100\), with tolerance set to \(10^{-12}\)
Fast Factorization¶
\(N\) | Assembly(s) | MatVec(s) | Factorize(s) | Solve(s) | Determinant(s) | Direct LU(s) |
---|---|---|---|---|---|---|
\(10^{3}\) | 0.0284998 | 0.0002059 | 0.00317788 | 0.000329 | 2.19345e-05 | 0.022975 |
\(5 \times 10^{3}\) | 0.124997 | 0.0019462 | 0.028585 | 0.003378 | 0.000226021 | 1.57937 |
\(10^{4}\) | 0.284125 | 0.0044059 | 0.0781479 | 0.007328 | 0.000458002 | 11.3985 |
\(5 \times 10^{4}\) | 1.60538 | 0.033412 | 0.67361 | 0.047978 | 0.001616 | |
\(10^{5}\) | 5.49457 | 0.145549 | 2.47014 | 0.254623 | 0.00333095 | |
\(5 \times 10^{5}\) | 28.6773 | 3.55899 | 17.4057 | 0.818555 | 0.0195651 |

Fast Symmetric Factorization¶
\(N\) | Assembly(s) | MatVec(s) | Factorize(s) | Solve(s) | Determinant(s) | MultSymFactor(s) | Direct Cholesky(s) |
---|---|---|---|---|---|---|---|
\(10^{3}\) | 0.0468209 | 0.00031209 | 0.008219 | 0.00095796 | 3.19481e-05 | 0.0005548 | 0.034517 |
\(5 \times 10^{3}\) | 0.216226 | 0.00294399 | 0.042495 | 0.00592899 | 0.000274181 | 0.0056932 | 3.34734 |
\(10^{4}\) | 0.47921 | 0.00559902 | 0.10963 | 0.0136352 | 0.00058794 | 0.0109739 | 12.3261 |
\(5 \times 10^{4}\) | 2.51609 | 0.0369091 | 0.609879 | 0.091403 | 0.00190592 | 0.069257 | N/A |
\(10^{5}\) | 5.30011 | 0.124498 | 1.83744 | 0.198894 | 0.00388098 | 0.161215 | N/A |
\(5 \times 10^{5}\) | 28.9266 | 3.54814 | 13.6953 | 1.03255 | 0.0130229 | 1.06126 | N/A |

\(\texttt{dim} = 2\)¶
Time Taken vs Size of Matrix¶
For these benchmarks, the leaf size was fixed at \(M = 100\), with tolerance set to \(10^{-12}\)
Fast Factorization¶
\(N\) | Assembly(s) | MatVec(s) | Factorize(s) | Solve(s) | Determinant(s) | Direct LU(s) |
---|---|---|---|---|---|---|
\(10^{3}\) | 0.237684 | 0.0022819 | 0.161561 | 0.003957 | 0.000130177 | 0.0220509 |
\(2 \times 10^{3}\) | 0.854779 | 0.0138352 | 0.728164 | 0.013024 | 0.000378847 | 0.148836 |
\(4 \times 10^{3}\) | 3.31121 | 0.0200999 | 2.52401 | 0.037282 | 0.000866175 | 0.88069 |
\(8 \times 10^{3}\) | 15.0432 | 0.0863769 | 10.1511 | 0.120667 | 0.00184798 | 6.3137 |
\(1.6\times10^{4}\) | 63.8282 | 0.278127 | 53.9138 | 0.46551 | 0.0048461 | 55.4166 |

Fast Symmetric Factorization¶
\(N\) | Assembly(s) | MatVec(s) | Factorize(s) | Solve(s) | Determinant(s) | MultSymFactor(s) | Direct Cholesky(s) |
---|---|---|---|---|---|---|---|
\(10^{3}\) | 0.375776 | 0.00342607 | 0.24704 | 0.00635099 | 0.000114918 | 0.0118291 | 0.026149 |
\(2 \times 10^{3}\) | 1.35015 | 0.00995803 | 1.05952 | 0.0212729 | 0.000280142 | 0.0441241 | 0.130154 |
\(4 \times 10^{3}\) | 4.89776 | 0.0418921 | 4.12168 | 0.107635 | 0.000642061 | 0.142907 | 1.49561 |
\(8 \times 10^{3}\) | 19.4326 | 0.0971079 | 16.3962 | 0.201411 | 0.00139117 | 0.547673 | 6.13806 |
\(1.6\times10^{4}\) | 79.2166 | 0.539779 | 66.3061 | 0.716309 | 0.00301003 | 2.18507 | 52.0411 |

\(\texttt{dim} = 3\)¶
For these benchmarks, the leaf size was fixed at \(M = 100\), with tolerance set to \(10^{-12}\)
Fast Factorization¶
\(N\) | Assembly(s) | MatVec(s) | Factorize(s) | Solve(s) | Determinant(s) | Direct LU(s) |
---|---|---|---|---|---|---|
\(999\) | 0.637549 | 0.0138719 | 0.357686 | 0.006175 | 0.000198841 | 0.0427179 |
\(2 \times 999\) | 2.67525 | 0.0112109 | 1.915 | 0.026588 | 0.000624895 | 0.209663 |
\(3 \times 999\) | 7.72688 | 0.0243111 | 5.49087 | 0.056867 | 0.000961065 | 0.608226 |
\(4 \times 999\) | 16.6299 | 0.0466208 | 12.8754 | 0.105617 | 0.00150394 | 1.26595 |
\(5 \times 999\) | 31.9858 | 0.078845 | 24.7647 | 0.169478 | 0.00192094 | 2.26981 |
\(6 \times 999\) | 48.5977 | 0.114865 | 41.1307 | 0.241243 | 0.00212693 | 3.69254 |
\(7 \times 999\) | 76.1404 | 0.159873 | 64.5563 | 0.32299 | 0.00276279 | 5.80262 |
\(8 \times 999\) | 105.803 | 0.238746 | 91.5038 | 0.405407 | 0.00317407 | 8.46848 |

Fast Symmetric Factorization¶
\(N\) | Assembly(s) | MatVec(s) | Factorize(s) | Solve(s) | Determinant(s) | MultSymFactor(s) | Direct Cholesky(s) |
---|---|---|---|---|---|---|---|
\(999\) | 0.60924 | 0.00287509 | 0.303544 | 0.00726318 | 0.000123978 | 0.012032 | 0.046663 |
\(2 \times 999\) | 2.88964 | 0.0122139 | 1.91556 | 0.0310731 | 0.000324965 | 0.107921 | 0.294039 |
\(3 \times 999\) | 8.15757 | 0.029355 | 5.2501 | 0.079915 | 0.000537872 | 0.291421 | 0.848032 |
\(4 \times 999\) | 17.7442 | 0.055275 | 12.4843 | 0.159855 | 0.000857115 | 0.527207 | 1.91446 |
\(5 \times 999\) | 33.741 | 0.08866 | 23.0541 | 0.211891 | 0.000997066 | 0.891842 | 3.62749 |
\(6 \times 999\) | 50.7127 | 0.150594 | 39.2222 | 0.302245 | 0.00147223 | 1.60647 | 6.26212 |
\(7 \times 999\) | 77.2103 | 0.175368 | 60.007 | 0.388988 | 0.00160313 | 2.15237 | 9.70271 |
\(8 \times 999\) | 103.913 | 0.232704 | 82.881 | 0.490437 | 0.00186086 | 2.67931 | 14.0804 |
