API (C)¶

#include "sylver/sylver.h"

General¶

void sylver_init(int ncpu, int ngpu)¶

Initialization routine which should be called before any other routine within SyLVER. The number of CPUs and GPUs involved in the computations should be passed to this routine.

Parameters:	ncpu – number of CPUs to be used in the execution of SyLVER routines. ngpu – number of GPUs to be used in the execution of SyLVER routines. Note that if CUDA is not enabled during the compilation, this value will be ignored.

void sylver_finalize()¶: SyLVER termination routine which should be called once all the desired operations have been performed.

void sylver_default_options(sylver_options_t *options)¶

Intialises members of options structure to default values.

Parameters:	options – Structure to be initialised.

SpLDLT¶

Note

For the most efficient use of the package, CSC format should be used without checking.

void spldlt_analyse(int n, int *order, long const* ptr, int const* row, double const* val, void **akeep, bool check, sylver_options_t const* options, sylver_inform_t *inform)¶

Perform the analyse (symbolic) phase of the factorization for a matrix supplied in Compressed Sparse Column (CSC) format. The resulting symbolic factors stored in akeep should be passed unaltered in the subsequent calls to spldlt_factorize().

Parameters:

n – number of columns in \(A\).
order[] – may be NULL; otherwise must be an array of size n used on entry a user-supplied ordering (options.ordering=0). On return, the actual ordering used.
ptr[n+1] – column pointers for \(A\) (see CSC format).
row[ptr[n]] –
row indices for \(A\) (see CSC format).
val –
may be NULL; otherwise must be an array of size ptr[n] containing non-zero values for \(A\) (see CSC format). Only used if a matching-based ordering is requested.
akeep – returns symbolic factorization, to be passed unchanged to subsequent routines.
check – if true, matrix data is checked. Out-of-range entries are dropped and duplicate entries are summed.
options – specifies algorithm options to be used (see sylver_options_t).
inform – returns information about the execution of the routine (see sylver_inform_t).

Note

If a user-supplied ordering is used, it may be altered by this routine, with the altered version returned in order[]. This version will be equivalent to the original ordering, except that some supernodes may have been amalgamated, a topologic ordering may have been applied to the assembly tree and the order of columns within a supernode may have been adjusted to improve cache locality.

void spldlt_factorize(bool posdef, long const* ptr, int const* row, double const* val, double *scale, void *akeep, void **fkeep, sylver_options_t const* options, sylver_inform_t *inform)¶

Parameters:

posdef – true if matrix is positive-definite
ptr – may be NULL; otherwise a length n+1 array of column pointers for \(A\), only required if akeep was obtained by running spldlt_analyse() with check=true, in which case it must be unchanged since that call.
row – may be NULL; otherwise a length ptr[n] array of row indices for \(A\), only required if akeep was obtained by running spldlt_analyse() with check=true, in which case it must be unchanged since that call.
val[] – non-zero values for \(A\) in same format as for the call to spldlt_analyse().
scale – may be NULL; otherwise a length n array for diagonal scaling. scale[i-1] contains entry \(S_ii\) of \(S\). Must be supplied by user on entry if options.scaling=0 (user-supplied scaling). On exit, returns scaling used.
akeep – symbolic factorization returned by preceding call to spldlt_analyse().
fkeep – returns numeric factorization, to be passed unchanged to subsequent routines.
options – specifies algorithm options to be used (see sylver_options_t).
inform – returns information about the execution of the routine (see sylver_inform_t).

void spldlt_solve(int job, int nrhs, double *x, int ldx, void *akeep, void *fkeep, sylver_options_t const* options, sylver_inform_t *inform)¶

Solve (for nrhs right-hand sides) one of the following equations:

job	Equation solved
0	\(AX=B\)
1	\(PLX=SB\)
2	\(DX=B\)
3	\((PL)^TS^{-1}X=B\)
4	\(D(PL)^TS^{-1}X=B\)

Recall \(A\) has been factorized as either:

\(SAS = (PL)(PL)^T~\) (positive-definite case); or
\(SAS = (PL)D(PL)^T\) (indefinite case).

Parameters:

job – specifies equation to solve, as per above table.
nrhs – number of right-hand sides.
x[ldx*nrhs] – right-hand sides \(B\) on entry, solutions \(X\) on exit. The i-th entry of right-hand side j is in position x[j*ldx+i].
ldx – leading dimension of x.
akeep – symbolic factorization returned by preceding call to spldlt_analyse().
fkeep – numeric factorization returned by preceding call to spldlt_factor().
options – specifies algorithm options to be used (see sylver_options_t).
inform – returns information about the execution of the routine (see sylver_inform_t).

void spldlt_free_akeep(void **akeep)¶

Frees memory and resources associated with akeep.

Parameters:	akeep – symbolic factors to be freed.

int spldlt_free_fkeep(void **fkeep)¶

Frees memory and resources associated with fkeep.

Parameters:	fkeep – numeric factors to be freed.

Data types¶

Options¶

struct sylver_options_t¶

The data type sylver_options_t is used to specify the options used within SyLVER. The components, that are automatically given default values in the definition of the type, are:

int print_level¶

Level of printing:

< 0	No printing.
= 0 (default)	Error and warning messages only.
= 1	As 0, plus basic diagnostic printing.
> 1	As 1, plus some additional diagnostic printing.

The default is 0.

int unit_diagnostics¶

Fortran unit number for diagnostics printing. Printing is suppressed if <0.

The default is 6 (stdout).

int unit_error¶

Fortran unit number for printing of error messages. Printing is suppressed if <0.

The default is 6 (stdout).

int unit_warning¶

Fortran unit number for printing of warning messages. Printing is suppressed if <0.

The default is 6 (stdout).

int ordering¶

Ordering method to use in analyse phase:

0 User-supplied ordering is used (order argument to sylver_analyse()).

1 (default) METIS ordering with default settings.

2

Matching-based elimination ordering is computed (the Hungarian algorithm is used to identify large off-diagonal entries. A restricted METIS ordering is then used that forces these on to the subdiagonal).

Note: This option should only be chosen for indefinite systems. A scaling is also computed that may be used in sylver_factor() (see scaling below).

The default is 1.

int nemin¶: Supernode amalgamation threshold. Two neighbours in the elimination tree are merged if they both involve fewer than nemin eliminations. The default is used if nemin<1. The default is 8.

bool prune_tree¶

If true, prune the elimination tree to better exploit data locality in the parallel factorization.

The default is true.

long min_gpu_work¶

Minimum number of flops in subtree before scheduling on GPU.

Default is 5e9.

int scaling¶

Scaling algorithm to use:

<=0 (default)	No scaling (if `scale[]` is not present on call to `spldlt_factor()`, or user-supplied scaling (if `scale[]` is present).
=1	Compute using weighted bipartite matching via the Hungarian Algorithm (MC64 algorithm).
=2	Compute using a weighted bipartite matching via the Auction Algorithm (may be lower quality than that computed using the Hungarian Algorithm, but can be considerably faster).
=3	Use matching-based ordering generated during the analyse phase using `options.ordering=2`. The scaling will be the same as that generated with `options.scaling=1` if the matrix values have not changed. This option will generate an error if a matching-based ordering was not used during analysis.
>=4	Compute using the norm-equilibration algorithm of Ruiz (see scaling).

The default is 0.

int pivot_method¶

Pivot method to be used on CPU, one of:

2 (default)	Block a posteori pivoting. A failed pivot only requires recalculation of entries within its own block column.
3	Threshold partial pivoting. Not parallel.

Default is 2.

double small¶

Threshold below which an entry is treated as equivalent to 0.0.

The default is 1e-20.

double u¶

Relative pivot threshold used in symmetric indefinite case. Values outside of the range \([0,0.5]\) are treated as the closest value in that range.

The default is 0.01.

long small_subtree_threshold¶

Maximum number of flops in a subtree treated as a single task.

The default is 4e6.

int nb¶

Block size to use for parallelization of large nodes on CPU resources.

Default is 256.

int cpu_topology¶

1 (default)	Automatically chose the CPU tology depending on the underlying architecture.
2	Assume flat topology and in particular ignore NUMA structure.
3	Use NUMA structure of underlying architecture to better exploit data locality in the parallel execution

Default is 1.

bool action¶

Continue factorization of singular matrix on discovery of zero pivot if true (a warning is issued), or abort if false.

The default is true.

bool use_gpu¶

Use an NVIDIA GPU if present.

Default is true.

float gpu_perf_coeff¶

GPU perfromance coefficient. How many times faster a GPU is than CPU at factoring a subtree.

Default is 1.0.

Information¶

struct sylver_inform_t¶

Used to return information about the progress and needs of the algorithm.

int flag¶: Exit status of the algorithm (see table below).

int matrix_dup¶: Number of duplicate entries encountered (if spldlt_analyse() called with check=true).

int matrix_missing_diag¶: Number of diagonal entries without an explicit value (if spldlt_analyse() called with check=true).

int matrix_outrange¶: Number of out-of-range entries encountered (if spldlt_analyse() called with check=true).

int matrix_rank¶: (Estimated) rank (structural after analyse phase, numerical after factorize phase).

int maxdepth¶: Maximum depth of the assembly tree.

int maxfront¶: Maximum front size (without pivoting after analyse phase, with pivoting after factorize phase).

int num_delay¶: Number of delayed pivots. That is, the total number of fully-summed variables that were passed to the father node because of stability considerations. If a variable is passed further up the tree, it will be counted again.

long num_factor¶: Number of entries in \(L\) (without pivoting after analyse phase, with pivoting after factorize phase).

long num_flops¶: Number of floating-point operations for Cholesky factorization (indefinte needs slightly more). Without pivoting after analyse phase, with pivoting after factorize phase.

int num_neg¶: Number of negative eigenvalues of the matrix \(D\) after factorize phase.

int num_sup¶: Number of supernodes in assembly tree.

int num_two¶: Number of \(2 \times 2\) pivots used by the factorization (i.e. in the matrix \(D\)).

int stat¶: Fortran allocation status parameter in event of allocation error (0 otherwise).

inform.flag	Return status
0	Success.
-1	Error in sequence of calls (may be caused by failure of a preceding call).
-2	n<0 or ne<1.
-3	Error in ptr[].
-4	CSC format: All variable indices in one or more columns are out-of-range. Coordinate format: All entries are out-of-range.
-5	Matrix is singular and options.action=false
-6	Matrix found not to be positive definite.
-7	ptr[] and/or row[] not present, but required as `spldlt_analyse()` was called with check=false.
-8	options.ordering out of range, or options.ordering=0 and order parameter not provided or not a valid permutation.
-9	options.ordering=-2 but val[] was not supplied.
-10	ldx<n or nrhs<1.
-11	job is out-of-range.
-13	Called `spldlt_enquire_posdef()` on indefinite factorization.
-14	Called `spldlt_enquire_indef()` on positive-definite factorization.
-15	options.scaling=3 but a matching-based ordering was not performed during analyse phase.
-50	Allocation error. If available, the stat parameter is returned in inform.stat.
-51	CUDA error. The CUDA error return value is returned in inform.cuda_error.
-52	CUBLAS error. The CUBLAS error return value is returned in inform.cublas_error.
+1	Out-of-range variable indices found and ignored in input data. inform.matrix_outrange is set to the number of such entries.
+2	Duplicate entries found and summed in input data. inform.matrix_dup is set to the number of such entries.
+3	Combination of +1 and +2.
+4	One or more diagonal entries of \(A\) are missing.
+5	Combination of +4 and +1 or +2.
+6	Matrix is found be (structurally) singular during analyse phase. This will overwrite any of the above warning flags.
+7	Matrix is found to be singular during factorize phase.
+8	Matching-based scaling found as side-effect of matching-based ordering ignored (consider setting options.scaling=3).
+50	OpenMP processor binding is disabled. Consider setting the environment variable OMP_PROC_BIND=true (this may affect performance on NUMA systems).

int cuda_error¶: CUDA error code in the event of a CUDA error (0 otherwise). Note that due to asynchronous execution, CUDA errors may not be reported by the call that caused them.

int cublas_error¶: cuBLAS error code in the event of a cuBLAS error (0 otherwise).