API (Fortran)

In the below, all reals are double precision unless otherwise indicated.

General

subroutine sylver_init(ncpu, ngpu)

Initialization routine which should be called before any other routine within SyLVER. The number of CPUs and GPUs involved in the computations should be passed to this routine.

Parameters:
  • ncpu [integer,in] :: number of CPUs to be used in the execution of SyLVER routines.
  • ngpu [integer,in] :: number of GPUs to be used in the execution of SyLVER routines. Note that if CUDA is not enabled during the compilation, this value will be ignored.
subroutine sylver_finalize()

SyLVER termination routine which should be called once all the desired operations have been performed.

SpLDLT

subroutine spldlt_analyse(akeep, n, ptr, row, options, inform[, order, val, ncpu, ngpu, check])

Perform the analyse (symbolic) phase of the factorization for a matrix supplied in Compressed Sparse Column (CSC) format. The resulting symbolic factors stored in spldlt_akeep should be passed unaltered in the subsequent calls to spldlt_factorize().

Parameters:
  • akeep [spldlt_akeep,out] :: returns symbolic factorization, to be passed unchanged to subsequent routines.
  • n [integer,in] :: number of columns in \(A\).
  • ptr (n+1) [long,in] :: column pointers for \(A\) (see CSC format).
  • row (ptr(n+1)-1) [integer,in] ::

    row indices for \(A\) (see CSC format).

  • options [sylver_options,in] :: specifies algorithm options to be used (see sylver_options).
  • inform [sylver_inform,out] :: returns information about the execution of the routine (see sylver_inform).
Options:
  • order (n) [integer,inout] :: on entry a user-supplied ordering (options%ordering=0). On return, the actual ordering used (if present).
  • val (ptr(n+1)-1) [real,in] ::

    non-zero values for \(A\) (see CSC format). Only used if a matching-based ordering is requested.

  • ncpu [integer,in] :: Number of CPU available for the execution. If absent, the value of ncpu passed to the sylver_init() routine is used instead.
  • ncpu :: Number of GPU available for the execution. This value is ignored if CUDA is not enabled during the compilation and if absent, the value of ncpu passed to the sylver_init() routine is used instead.
  • check [logical,in] :: if true, matrix data is checked. Out-of-range entries are dropped and duplicate entries are summed.

Note

If a user-supplied ordering is used, it may be altered by this routine, with the altered version returned in order(:). This version will be equivalent to the original ordering, except that some supernodes may have been amalgamated, a topological ordering may have been applied to the assembly tree and the order of columns within a supernode may have been adjusted to improve cache locality.

subroutine spldlt_factorize(akeep, fkeep, posdef, val, options, inform[, scale, ptr, row])
Parameters:
  • akeep [spldlt_akeep,out] :: symbolic factorization returned by preceding call to spldlt_analyse().
  • akeep :: returns numeric factorization, to be passed unchanged to subsequent routines.
  • posdef [logical,in] :: true if matrix is positive-definite.
  • val (*) [real,in] :: non-zero values for \(A\) in same format as for the call to spldlt_analyse().
  • options [sylver_options,in] :: specifies algorithm options to be used (see sylver_options).
  • inform [sylver_inform,out] :: returns information about the execution of the routine (see sylver_inform).
Options:
  • scale (n) [real,inout] :: diagonal scaling. scale(i) contains entry \(S_{ii}\) of \(S\). Must be supplied by user if options%scaling=0 (user-supplied scaling). On exit, return scaling used.
  • ptr (n+1) [integer(long),in] :: column pointers for \(A\), only required if scaling is required (options%scaling > 0) expect in the case where matching-based ordering is done (options%scaling = 3)
  • row (ptr(n+1)-1) [integer,in] :: row indices for \(A\), only required if scaling is required (options%scaling > 0) expect in the case where matching-based ordering is done (options%scaling = 3)
subroutine spldlt_solve(akeep, fkeep, nrhs, x, ldx, options, inform[, job])

Solve (for \(nrhs\) right-hand sides) one of the following equations:

job Equation solved
0 (or absent) \(AX=B\)
1 \(PLX=SB\)
2 \(DX=B\)
3 \((PL)^TS^{-1}X=B\)
4 \(D(PL)^TS^{-1}X=B\)

Recall \(A\) has been factorized as either:

  • \(SAS = (PL)(PL)^T~\) (positive-definite case); or
  • \(SAS = (PL)D(PL)^T\) (indefinite case).
Parameters:
  • akeep [spldlt_akeep,in] :: symbolic factorization returned by preceding call to spldlt_analyse()
  • fkeep [spldlt_fkeep,in] :: numeric factorization returned by preceding call to spldlt_factor().
  • nrhs [integer,in] :: number of right-hand sides.
  • x (ldx,nrhs) [real,inout] :: right-hand sides \(B\) on entry, solutions \(X\) on exit.
  • ldx [integer,in] :: leading dimension of x.
  • options [sylver_options,in] :: specifies algorithm options to be used (see sylver_options).
  • inform [sylver_inform,out] :: returns information about the execution of the routine (see sylver_inform).
Options:

job [integer,in] :: specifies equation to solve, as per above table.

Data types

Options

type sylver_options

The derived data type sylver_options is used to specify the options used within SyLVER. The components, that are automatically given default values in the definition of the type, are:

Type fields:
  • % print_level [integer,default=0] ::

    the level of printing. The different levels are:

    < 0 No printing.
    = 0 Error and warning messages only.
    = 1 As 0, plus basic diagnostic printing.
    > 1 As 1, plus some additional diagnostic printing.
  • % unit_diagnostics [integer,default=6] :: Fortran unit number for diagnostics printing. Printing is suppressed if <0.
  • % unit_error [integer,default=6] :: Fortran unit number for printing of error messages. Printing is suppressed if <0.
  • % unit_warning [integer,default=6] :: Fortran unit number for printing of warning messages. Printing is suppressed if <0.
  • % ordering [integer,default=1] ::

    Ordering method to use in analyse phase:

    0 User-supplied ordering is used (order argument to spldlt_analyse() or splu_analyse()).
    1 (default) METIS ordering with default settings.
    2 Matching-based elimination ordering is computed (the Hungarian algorithm is used to identify large off-diagonal entries. A restricted METIS ordering is then used that forces these on to the subdiagonal).

    Note: This option should only be chosen for indefinite systems. A scaling is also computed that may be used in spldlt_factor() or splu_factor() (see %scaling below).

  • % nemin [integer,default=32] :: supernode amalgamation threshold. Two neighbours in the elimination tree are merged if they both involve fewer than nemin eliminations. The default is used if nemin<1.
  • % use_gpu [logical,default=true] :: Use an NVIDIA GPU if present.
  • % scaling [integer,default=0] ::

    scaling algorithm to use:

    <=0 (default) No scaling (if scale(:) is not present on call to spldlt_factor() or splu_factor(), or user-supplied scaling (if scale(:) is present).
    =1 Compute using weighted bipartite matching via the Hungarian Algorithm (MC64 algorithm).
    =2 Compute using a weighted bipartite matching via the Auction Algorithm (may be lower quality than that computed using the Hungarian Algorithm, but can be considerably faster).
    =3 Use matching-based ordering generated during the analyse phase using options%ordering=2. The scaling will be the same as that generated with options%scaling= 1 if the matrix values have not changed. This option will generate an error if a matching-based ordering was not used during analysis.
    >=4 Compute using the norm-equilibration algorithm of Ruiz.
  • % nb [integer,default=256] :: Block size to use for parallelization of large nodes on CPU resources.
  • % pivot_method [integer,default=1] ::

    Pivot method to be used on CPU, one of:

    0 Aggressive a posteori pivoting. Cholesky-like communication pattern is used, but a single failed pivot requires restart of node factorization and potential recalculation of all uneliminated entries.
    1 (default) Block a posteori pivoting. A failed pivot only requires recalculation of entries within its own block column.
    2 Threshold partial pivoting. Not parallel.
  • % small [real,default=1d-20] :: threshold below which an entry is treated as equivalent to 0.0.
  • % u [real,default=0.01] :: relative pivot threshold used in symmetric indefinite case. Values outside of the range \([0,0.5]\) are treated as the closest value in that range.

Information

type sylver_inform

The derived data type sylver_inform is used to return information about the progress and needs of the algorithm that might be of interest for the user.

Type fields:
  • % flag [integer] :: exit status of the algorithm (see table below).
  • % cublas_error [integer] :: CUBLAS error code in the event of a CUBLAS error (0 otherwise).
  • % cuda_error [integer] :: CUDA error code in the event of a CUDA error (0 otherwise). Note that due to asynchronous execution, CUDA errors may not be reported by the call that caused them.
  • % matrix_dup [integer] :: number of duplicate entries encountered (if spldlt_analyse() or splu_analyse() called with check=true).
  • % matrix_missing_diag [integer] :: number of diagonal entries without an explicit value (if spldlt_analyse() or splu_analyse() called with check=true).
  • % matrix_outrange [integer] :: number of out-of-range entries encountered (if spldlt_analyse() or splu_analyse() called with check=true).
  • % matrix_rank [integer] :: (estimated) rank (structural after analyse phase, numerical after factorize phase).
  • % maxdepth [integer] :: maximum depth of the assembly tree.
  • % maxfront [integer] :: maximum front size (without pivoting after analyse phase, with pivoting after factorize phase).
  • % num_delay [integer] :: number of delayed pivots. That is, the total number of fully-summed variables that were passed to the father node because of stability considerations. If a variable is passed further up the tree, it will be counted again.
  • % num_factor [long] :: number of entries in \(L\) (without pivoting after analyse phase, with pivoting after factorize phase).
  • % num_flops [long] :: number of floating-point operations for Cholesky factorization (indefinte needs slightly more). Without pivoting after analyse phase, with pivoting after factorize phase.
  • % num_neg [integer] :: number of negative eigenvalues of the matrix \(D\) after factorize phase.
  • % num_sup [integer] :: number of supernodes in assembly tree.
  • % num_two [integer] ::

    number of \(2 \times 2\) pivots used by the factorization (i.e. in the matrix \(D\) in the indefinite

    case).
  • % stat [integer] :: Fortran allocation status parameter in event of allocation error (0 otherwise).
inform%flag Return status
0 Success.
-1 Error in sequence of calls (may be caused by failure of a preceding call).
-2 n<0 or ne<1.
-3 Error in ptr(:).
-4

CSC format: All variable indices in one or more columns are out-of-range.

Coordinate format: All entries are out-of-range.

-5 Matrix is singular and options%action=.false.
-6 Matrix found not to be positive definite but posdef=true.
-7 ptr(:) and/or row(:) not present although required.
-8 options%ordering out of range, or options%ordering=0 and order parameter not provided or not a valid permutation.
-9 options%ordering=-2 but val(:) was not supplied.
-10 ldx<n or nrhs<1.
-11 job is out-of-range.
-13 Called spldlt_enquire_posdef() on indefinite factorization.
-14 Called spldlt_enquire_indef() on positive-definite factorization.
-15 options%scaling=3 but a matching-based ordering was not performed during analyse phase.
-50 Allocation error. If available, the stat parameter is returned in inform%stat.
-51 CUDA error. The CUDA error return value is returned in inform%cuda_error.
-52 CUBLAS error. The CUBLAS error return value is returned in inform%cublas_error.
+1 Out-of-range variable indices found and ignored in input data. inform%matrix_outrange is set to the number of such entries.
+2 Duplicate entries found and summed in input data. inform%matrix_dup is set to the number of such entries.
+3 Combination of +1 and +2.
+4 One or more diagonal entries of \(A\) are missing.
+5 Combination of +4 and +1 or +2.
+6 Matrix is found be (structurally) singular during analyse phase. This will overwrite any of the above warning flags.
+7 Matrix is found to be singular during factorize phase.
+8 Matching-based scaling found as side-effect of matching-based ordering ignored (consider setting options%scaling=3).