|
*gesvd, and *bdsdc).
Items are labelled (a) through (m), omitting (l).
Several are not bugs, just suggestions.
Most bugs are in *gesdd.
There's one bug (g) in *bdsdc. This is the underlying cause of LAPACK bug #111.
There's one bug (m) in [cz]gesvd. I also added an INT() cast in these
assignments to silence compiler warnings. Changed:
LWORK_ZGEQRF=CDUM(1)
to:
LWORK_ZGEQRF = INT( CDUM(1) )
Where possible, I ran a test showing the wrong behavior, then a test showing the
corrected behavior. These use a modified version of the MAGMA SVD tester
(calling LAPACK), because I could adjust the lwork as needed. The last 3 columns
are the lwork type, the lwork size, and the lwork formula. The lwork types are:
doc_old as documented in LAPACK 3.6.
doc as in the attached, updated documentation.
min_old minwrk, as computed in LAPACK 3.6.
min minwrk, as computed in the attached, updated code.
min-1 minimum - 1; this should cause gesdd to return an error.
opt optimal size.
max the maximum size LAPACK will take advantage of;
some cases, the optimal is n*n + work, while the max is m*n + work.
query what gesdd returns for an lwork query; should equal opt or max.
After the lwork, occasionally there is a ! or ? error code indicating:
Error codes: ! error: lwork < min. For (min-1), this ought to appear.
? compatability issue: lwork < min_old, will fail for lapack <= 3.6.
I also tested the update routines on a wide variety of sizes and jobz, with
various lwork.
Besides fixing the bugs below, I made two significant changes.
1) Changed *gesdd from computing each routine's workspace using, e.g.:
N*ilaenv(...)
to querying each routine for its LWORK, e.g.:
CALL ZGEBRD( M, N, CDUM(1), M, CDUM(1), DUM(1), CDUM(1),
$ CDUM(1), CDUM(1), -1, IERR )
LWORK_ZGEBRD_MN = INT( CDUM(1) )
This matches how *gesvd was changed in LAPACK 3.4.0.
2) Changed the Workspace: comments, which were incredibly deceptive.
For instance, in Path 2 before dbdsdc, it said
Workspace: need N + N*N + BDSPAC
since dbdsdc needs the [e] vector, [U] matrix, and bdspac.
However, that ignores that the [tauq, taup] vectors and [R] matrix
are also already allocated, though dbdsdc doesn't need them.
So the workspace needed at that point in the code is actually
Workspace: need N*N [R] + 3*N [e, tauq, taup] + N*N [U] + BDSPAC
For clarity, I added in [brackets] what matrices or vectors were allocated,
and the order reflects their order in memory.
I may do a similar change for *gesvd eventually. The workspace comments in
MAGMA's *gesvd have already been updated as above.
================================================================================
a) Throughout, to simplify equations, let:
mn = min( M, N )
mx = max( M, N )
================================================================================
b) [sdcz]gesdd Path 4 (m >> n, job="all") has wrong minwrk formula in code:
minwrk = bdspac + mn*mn + 2*mn + mx
= 4*mn*mn + 6*mn + mx
This is an overestimate, needlessly rejecting the documented formula:
doc = 4*mn*mn + 7*mn
In complex, the correct min fails, but the doc matches the wrong minwrk.
Solution: fix code to:
minwrk = mn*mn + max( 3*mn + bdspac, mn + mx )
= mn*mn + max( 3*mn*mn + 7*mn, mn + mx )
Test cases:
m=40, ..., 100; n=20; jobz='A'
See bug (d) showing documentation is also wrong.
Also, see bug (i), complex [cz]gesdd should return -12 instead of -13.
================================================================================
bt) transposed case
[sd]gesdd Path 4t (n >> m, job="all") has a different wrong minwrk; see bug (c).
[cz]gesdd Path 4t exhibits same bug as Path 4.
Test cases:
m=20; n=40, ..., 100; jobz='A'
================================================================================
c) [sd]gesdd Path 4t (n >> m, job="all") has wrong minwrk formula in code,
different than bug (b):
minwrk = bdspac + m*m + 3*m
= 4*mn*mn + 7*mn
This formula lacks any dependence on N, so the code will fail (without
setting info; orglq calls xerbla) if N is too large, N > 3*M*M + 6*M.
Bug does not occur in complex.
Test cases:
m=20; n = 1320; jobz='A' ok with documented lwork
m=20; n > 1320; jobz='A' fails with documented lwork
Solution: as in bug (b), fix code to:
minwrk = mn*mn + max( 3*mn + bdspac, mn + mx )
= mn*mn + max( 3*mn*mn + 7*mn, mn + mx )
See bug (d) showing documentation is also wrong.
================================================================================
d) [sd]gesdd documentation lists the same minimum size for jobz='S' and 'A':
If JOBZ = 'S' or 'A', LWORK >= min(M,N)*(7 + 4*min(M,N))
However, jobz='A' actually also depends on max(M,N):
minwrk = mn*mn + max( 3*mn*mn + 7*mn, mn + mx )
This causes the formula to fail for mx > 3*mn*mn + 6*mn.
Test cases:
m > 1320; n = 20; jobz='A' fails with document lwork, even after fixing bugs (b) and (c).
m = 20; n > 1320; jobz='A' fails also.
Solution: in docs, split these two cases. This fix uses an overestimate,
so that codes using it will be backwards compatible with LAPACK <= 3.6.
If JOBZ = 'S', LWORK >= 4*mn*mn + 7*mn.
If JOBZ = 'A', LWORK >= 4*mn*mn + 6*mn + mx.
================================================================================
e) [sd]gesdd, Path 5, jobz='A' has wrong maxwrk formula in the code:
MAXWRK = MAX( MAXWRK, BDSPAC + 3*N )
Should be:
MAXWRK = MAX( WRKBL, BDSPAC + 3*N )
This causes the lwork query to ignore WRKBL, and return the minimum
workspace size, BDSPAC + 3*N, instead of the optimal workspace size.
However, it only affects the result for small sizes where min(M,N) < NB.
Path 5t has the correct maxwrk formula.
Complex is correct for both Path 5 and 5t.
Test case:
Compare lwork query with
M = 30, N = 20, jobz='A', lwork query is 1340
M = 20, N = 30, jobz='A', lwork query is 3260
These should be the same.
Solution: fix code as above.
================================================================================
f) Not a bug, just a suggestion.
The lwork minimum sizes are not actually minimums, and can be larger than
the queried lwork size.
Solution: add a comment:
These are not tight minimums in all cases; see comments inside code.
================================================================================
g) [sd]bdsdc segfaults due to too small workspace size. Its documentation claims:
If COMPQ = 'N' then LWORK >= (4 * N).
Based on this, in zgesdd, the rwork size >= 5*min(M,N).
However, LAPACK bug 111 found that rwork size >= 7*min(M,N) was required.
In dbdsdc, if uplo='L', then it rotates lower bidiagonal to upper bidiagonal,
and saves 2 vectors of Givens rotations in work. It shifts WSTART from
1 to 2*N-1. Then it calls dlasdq( ..., work( wstart ), info ).
As dlasdq requires 4*N, dbdsdc would now require 6*N in this case.
This caused zgesdd to require rwork size >= 7*min(M,N) when N > M and jobz='N'.
My preferred solution is to change WSTART to 1 in the DLASDQ call inside dbdsdc:
IF( ICOMPQ.EQ.0 ) THEN
CALL DLASDQ( 'U', 0, N, 0, 0, 0, D, E, VT, LDVT, U, LDU, U,
$ LDU, WORK( WSTART ), INFO )
GO TO 40
END IF
to:
IF( ICOMPQ.EQ.0 ) THEN
* Ignores WSTART, which is needed only for ICOMPQ = 1 or 2;
* using WSTART would change required workspace to 6*N for uplo='L'.
CALL DLASDQ( 'U', 0, N, 0, 0, 0, D, E, VT, LDVT, U, LDU, U,
$ LDU, WORK( 1 ), INFO )
GO TO 40
END IF
The [cz]gesdd documentation, which was changed to 7*min(M,N) in LAPACK 3.6,
may be reverted to 5*min(M,N), if desired.
================================================================================
h) [sd]gesdd for jobz='N' requires bdspac = 7*n for the dbdsdc workspace.
However, dbdsdc requires only 4*n, or 6*n before fixing bug (g).
For backwards compatability, I did not change the code, but added a comment
for clarification.
================================================================================
i) [cz]gesdd returns info = -13 instead of info = -12 for lwork errors.
================================================================================
j) In zgesdd, for computing maxwrk, these paths:
Path 6, jobz=A
Path 6t, jobz=S
Path 6t, jobz=A
query ilaenv( 1, zungbr, ... )
when the code actually calls zunmbr (twice). I corrected it.
================================================================================
k) In zgesdd documentation, currently
lrwork >= max( 5*mn*mn + 7*mn, 2*mx*mn + 2*mn*mn + mn )
It doesn't need that much, particularly for (mx >> mn) case.
If (mx >> mn), lrwork >= 5*mn*mn + 5*mn;
else, lrwork >= max( 5*mn*mn + 5*mn,
2*mx*mn + 2*mn*mn + mn ).
I changed this in the documentation. Feel free to revert if you prefer.
================================================================================
m) [cz]gesvd, Path 10 and 10t, have minwrk inside the wrong conditional:
IF( .NOT.WNTVN ) THEN
MAXWRK = MAX( MAXWRK, 2*N+LWORK_ZUNGBR_P )
MINWRK = 2*N + M
END IF
So Path 10 with jobvt='N', and Path 10t with jobu='N', have minwrk = 1,
so an invalid lwork is not correctly rejected.
================================================================================
mt) transposed case
broken: with old routine, Path 10t with jobu='N' doesn't enforce minwrk
|