API (Fortran)¶
In the below, all reals are double precision unless otherwise indicated.
General¶
-
subroutine
sylver_init
(ncpu, ngpu)¶ Initialization routine which should be called before any other routine within SyLVER. The number of CPUs and GPUs involved in the computations should be passed to this routine.
Parameters: - ncpu [integer,in] :: number of CPUs to be used in the execution of SyLVER routines.
- ngpu [integer,in] :: number of GPUs to be used in the execution of SyLVER routines. Note that if CUDA is not enabled during the compilation, this value will be ignored.
-
subroutine
sylver_finalize
()¶ SyLVER termination routine which should be called once all the desired operations have been performed.
SpLDLT¶
-
subroutine
spldlt_analyse
(akeep, n, ptr, row, options, inform[, order, val, ncpu, ngpu, check])¶ Perform the analyse (symbolic) phase of the factorization for a matrix supplied in Compressed Sparse Column (CSC) format. The resulting symbolic factors stored in
spldlt_akeep
should be passed unaltered in the subsequent calls tospldlt_factorize()
.Parameters: - akeep [spldlt_akeep,out] :: returns symbolic factorization, to be passed unchanged to subsequent routines.
- n [integer,in] :: number of columns in \(A\).
- ptr (n+1) [long,in] :: column pointers for \(A\) (see CSC format).
- row (ptr(n+1)-1) [integer,in] ::
row indices for \(A\) (see CSC format).
- options [sylver_options,in] :: specifies algorithm options to be used
(see
sylver_options
). - inform [sylver_inform,out] :: returns information about the
execution of the routine (see
sylver_inform
).
Options: - order (n) [integer,inout] :: on entry a user-supplied ordering (options%ordering=0). On return, the actual ordering used (if present).
- val (ptr(n+1)-1) [real,in] ::
non-zero values for \(A\) (see CSC format). Only used if a matching-based ordering is requested.
- ncpu [integer,in] :: Number of CPU available for the execution. If
absent, the value of ncpu passed to the
sylver_init()
routine is used instead. - ncpu :: Number of GPU available for the
execution. This value is ignored if CUDA is
not enabled during the compilation and if
absent, the value of ncpu passed to the
sylver_init()
routine is used instead. - check [logical,in] :: if true, matrix data is checked. Out-of-range entries are dropped and duplicate entries are summed.
Note
If a user-supplied ordering is used, it may be altered by this routine, with the altered version returned in order(:). This version will be equivalent to the original ordering, except that some supernodes may have been amalgamated, a topological ordering may have been applied to the assembly tree and the order of columns within a supernode may have been adjusted to improve cache locality.
-
subroutine
spldlt_factorize
(akeep, fkeep, posdef, val, options, inform[, scale, ptr, row])¶ Parameters: - akeep [spldlt_akeep,out] :: symbolic factorization returned by
preceding call to
spldlt_analyse()
. - akeep :: returns numeric factorization, to be passed unchanged to subsequent routines.
- posdef [logical,in] :: true if matrix is positive-definite.
- val (*) [real,in] :: non-zero values for \(A\) in same format
as for the call to
spldlt_analyse()
. - options [sylver_options,in] :: specifies algorithm options to be
used (see
sylver_options
). - inform [sylver_inform,out] :: returns information about the
execution of the routine (see
sylver_inform
).
Options: - scale (n) [real,inout] :: diagonal scaling. scale(i) contains entry
\(S_{ii}\) of \(S\). Must be supplied by user if
options%scaling=0
(user-supplied scaling). On exit, return scaling used. - ptr (n+1) [integer(long),in] :: column pointers for \(A\), only required if scaling is required (options%scaling > 0) expect in the case where matching-based ordering is done (options%scaling = 3)
- row (ptr(n+1)-1) [integer,in] :: row indices for \(A\), only required if scaling is required (options%scaling > 0) expect in the case where matching-based ordering is done (options%scaling = 3)
- akeep [spldlt_akeep,out] :: symbolic factorization returned by
preceding call to
-
subroutine
spldlt_solve
(akeep, fkeep, nrhs, x, ldx, options, inform[, job])¶ Solve (for \(nrhs\) right-hand sides) one of the following equations:
job Equation solved 0 (or absent) \(AX=B\) 1 \(PLX=SB\) 2 \(DX=B\) 3 \((PL)^TS^{-1}X=B\) 4 \(D(PL)^TS^{-1}X=B\) Recall \(A\) has been factorized as either:
- \(SAS = (PL)(PL)^T~\) (positive-definite case); or
- \(SAS = (PL)D(PL)^T\) (indefinite case).
Parameters: - akeep [spldlt_akeep,in] :: symbolic factorization returned by preceding
call to
spldlt_analyse()
- fkeep [spldlt_fkeep,in] :: numeric factorization returned by preceding
call to
spldlt_factor()
. - nrhs [integer,in] :: number of right-hand sides.
- x (ldx,nrhs) [real,inout] :: right-hand sides \(B\) on entry, solutions \(X\) on exit.
- ldx [integer,in] :: leading dimension of
x
. - options [sylver_options,in] :: specifies algorithm options to be used
(see
sylver_options
). - inform [sylver_inform,out] :: returns information about the
execution of the routine (see
sylver_inform
).
Options: job [integer,in] :: specifies equation to solve, as per above table.
Data types¶
Options¶
-
type
sylver_options
¶ The derived data type
sylver_options
is used to specify the options used withinSyLVER
. The components, that are automatically given default values in the definition of the type, are:Type fields: %
print_level
[integer,default=0] ::the level of printing. The different levels are:
< 0 No printing. = 0 Error and warning messages only. = 1 As 0, plus basic diagnostic printing. > 1 As 1, plus some additional diagnostic printing. %
unit_diagnostics
[integer,default=6] :: Fortran unit number for diagnostics printing. Printing is suppressed if <0.%
unit_error
[integer,default=6] :: Fortran unit number for printing of error messages. Printing is suppressed if <0.%
unit_warning
[integer,default=6] :: Fortran unit number for printing of warning messages. Printing is suppressed if <0.%
ordering
[integer,default=1] ::Ordering method to use in analyse phase:
0 User-supplied ordering is used (order argument to spldlt_analyse()
orsplu_analyse()
).1 (default) METIS ordering with default settings. 2 Matching-based elimination ordering is computed (the Hungarian algorithm is used to identify large off-diagonal entries. A restricted METIS ordering is then used that forces these on to the subdiagonal). Note: This option should only be chosen for indefinite systems. A scaling is also computed that may be used in
spldlt_factor()
orsplu_factor()
(see %scaling below).%
nemin
[integer,default=32] :: supernode amalgamation threshold. Two neighbours in the elimination tree are merged if they both involve fewer than nemin eliminations. The default is used if nemin<1.%
use_gpu
[logical,default=true] :: Use an NVIDIA GPU if present.%
scaling
[integer,default=0] ::scaling algorithm to use:
<=0 (default) No scaling (if scale(:)
is not present on call tospldlt_factor()
orsplu_factor()
, or user-supplied scaling (ifscale(:)
is present).=1 Compute using weighted bipartite matching via the Hungarian Algorithm ( MC64
algorithm).=2 Compute using a weighted bipartite matching via the Auction Algorithm (may be lower quality than that computed using the Hungarian Algorithm, but can be considerably faster). =3 Use matching-based ordering generated during the analyse phase using options%ordering=2. The scaling will be the same as that generated with options%scaling= 1 if the matrix values have not changed. This option will generate an error if a matching-based ordering was not used during analysis. >=4 Compute using the norm-equilibration algorithm of Ruiz. %
nb
[integer,default=256] :: Block size to use for parallelization of large nodes on CPU resources.%
pivot_method
[integer,default=1] ::Pivot method to be used on CPU, one of:
0 Aggressive a posteori pivoting. Cholesky-like communication pattern is used, but a single failed pivot requires restart of node factorization and potential recalculation of all uneliminated entries. 1 (default) Block a posteori pivoting. A failed pivot only requires recalculation of entries within its own block column. 2 Threshold partial pivoting. Not parallel. %
small
[real,default=1d-20] :: threshold below which an entry is treated as equivalent to 0.0.%
u
[real,default=0.01] :: relative pivot threshold used in symmetric indefinite case. Values outside of the range \([0,0.5]\) are treated as the closest value in that range.
Information¶
-
type
sylver_inform
¶ The derived data type
sylver_inform
is used to return information about the progress and needs of the algorithm that might be of interest for the user.Type fields: %
flag
[integer] :: exit status of the algorithm (see table below).%
cublas_error
[integer] :: CUBLAS error code in the event of a CUBLAS error (0 otherwise).%
cuda_error
[integer] :: CUDA error code in the event of a CUDA error (0 otherwise). Note that due to asynchronous execution, CUDA errors may not be reported by the call that caused them.%
matrix_dup
[integer] :: number of duplicate entries encountered (ifspldlt_analyse()
orsplu_analyse()
called with check=true).%
matrix_missing_diag
[integer] :: number of diagonal entries without an explicit value (ifspldlt_analyse()
orsplu_analyse()
called with check=true).%
matrix_outrange
[integer] :: number of out-of-range entries encountered (ifspldlt_analyse()
orsplu_analyse()
called with check=true).%
matrix_rank
[integer] :: (estimated) rank (structural after analyse phase, numerical after factorize phase).%
maxdepth
[integer] :: maximum depth of the assembly tree.%
maxfront
[integer] :: maximum front size (without pivoting after analyse phase, with pivoting after factorize phase).%
num_delay
[integer] :: number of delayed pivots. That is, the total number of fully-summed variables that were passed to the father node because of stability considerations. If a variable is passed further up the tree, it will be counted again.%
num_factor
[long] :: number of entries in \(L\) (without pivoting after analyse phase, with pivoting after factorize phase).%
num_flops
[long] :: number of floating-point operations for Cholesky factorization (indefinte needs slightly more). Without pivoting after analyse phase, with pivoting after factorize phase.%
num_neg
[integer] :: number of negative eigenvalues of the matrix \(D\) after factorize phase.%
num_sup
[integer] :: number of supernodes in assembly tree.%
num_two
[integer] ::number of \(2 \times 2\) pivots used by the factorization (i.e. in the matrix \(D\) in the indefinite
case).%
stat
[integer] :: Fortran allocation status parameter in event of allocation error (0 otherwise).
inform%flag Return status 0 Success. -1 Error in sequence of calls (may be caused by failure of a preceding call). -2 n<0 or ne<1. -3 Error in ptr(:). -4 CSC format: All variable indices in one or more columns are out-of-range.
Coordinate format: All entries are out-of-range.
-5 Matrix is singular and options%action=.false. -6 Matrix found not to be positive definite but posdef=true. -7 ptr(:) and/or row(:) not present although required. -8 options%ordering out of range, or options%ordering=0 and order parameter not provided or not a valid permutation. -9 options%ordering=-2 but val(:) was not supplied. -10 ldx<n or nrhs<1. -11 job is out-of-range. -13 Called spldlt_enquire_posdef()
on indefinite factorization.-14 Called spldlt_enquire_indef()
on positive-definite factorization.-15 options%scaling=3 but a matching-based ordering was not performed during analyse phase. -50 Allocation error. If available, the stat parameter is returned in inform%stat. -51 CUDA error. The CUDA error return value is returned in inform%cuda_error. -52 CUBLAS error. The CUBLAS error return value is returned in inform%cublas_error. +1 Out-of-range variable indices found and ignored in input data. inform%matrix_outrange is set to the number of such entries. +2 Duplicate entries found and summed in input data. inform%matrix_dup is set to the number of such entries. +3 Combination of +1 and +2. +4 One or more diagonal entries of \(A\) are missing. +5 Combination of +4 and +1 or +2. +6 Matrix is found be (structurally) singular during analyse phase. This will overwrite any of the above warning flags. +7 Matrix is found to be singular during factorize phase. +8 Matching-based scaling found as side-effect of matching-based ordering ignored (consider setting options%scaling=3).