Start of new and improved version 0.3 series. The build system has been simplified by removing less useful configuration options, including support for using the Windows SDK compiler instead of GCC (this options hasn't worked for some time anyway). Instead, there is an experimental option to compile with ATLAS instead of the reference BLAS library, which substantially improves LAPACK performance.
This is the version that should be used for compiling from source, as the older versions no longer compile due to some changes in the CLAPACK distribution. For precompiled binaries, see version 0.2.1.
ATLAS also provides a couple of optimized LAPACK routines, as well as a tuned version of ilaenv() which picks runtime parameters for all the other routines. This adds another speedup. For instance, I benchmarked .qml.minv (using the optimized routine) at 12 times faster and .qml.mqr (relying on tuned parameters) at 9 times faster than in QML 0.2.1.
The core of ATLAS is written in C, but it provides a Fortran interface as well. Even with the --nof77 configure flag, this interface gets compiled, because gcc is happy to invoke the gfortran frontend for *.f files. We need this interface to override the routines in plain CLAPACK, even though that's compiled completely in C.
Now, it is possible to convert the Fortran parts to C with F2C and get the whole thing to build without a Fortran compiler, but I think we'll eventually switch to Fortran LAPACK instead of CLAPACK, so we might as well just rely on gfortran from now on.
Systems of linear equations and linear least-squares problems can be solved in various ways using the basic matrix functions in LAPACK, but it is more efficient to use the driver routines and avoid multiple conversions between q form and LAPACK form. For instance, the LUP method involves constructing two triangular matrixes, but the LAPACK driver can keep them in a single matrix (L has implied ones on the diagonal).
QML 0.3.5 adds an interface to driver routines for these two problems, with one faster and one more accurate method per problem.
I fixed a small bug that could occur when a user function supplied to one of the CONMAX routines returned a null value. There might be a few variations of this, but in the clause I found, an array index is chosen based on some floating-point comparisons using arithmetic (three-way) Fortran IFs. These transfer control depending on whether their argument is negative, zero, or positive, but what happens when it is neither of these (NaN) is not specified (or it falls through if read differently). The F2C-converted version in QML 0.2.1 happened to take a safe path in this case, but in the Fortran-compiled version used since QML 0.3.4 the index ends up uninitialized.
The fix simply replaces any null value returned by a user function with infinity (negative infinity for “≥” constraints so that the corresponding parameter is not considered feasible).
“‘|’ is an ordinary character and there is no equivalent for its functionality.” – re_format(7). That applies to basic regexes in the BSD implementation of sed, which is used on Darwin. GNU sed allows ‘|’ as the alternation operator.
Also, according to POSIX, an empty subgroup or an empty alternative is either invalid or gives undefined results.
In search of a robust build on 64-bit Windows, updated from ATLAS 3.8.4 to the bleeding edge 3.9.46. There might be a performance benefit somewhere in there as well.
There's a new make check
target to invoke the internal
tests of individual libraries, which currently includes just ATLAS. The
venerable test.q program (invoked, as usual,
by make test
) is intended mainly to check the interface
code in QML itself, although it has decent converage and flagged one bug
in ATLAS 3.9.45 while I was working on that version.
The current status on 64-bit Windows is that ATLAS might or might not build.
I'm considering refocusing QML to make broader use of system libraries, in particular for BLAS and LAPACK. ATLAS and Netlib LAPACK take a long time to build, so it's better to use independently installed versions if available. Linking to other high-performance implementations like the free software EIGEN and the vendor libraries ACML and MKL may also be of interest.
QML 0.2.1 is still the stable version. It does not have the matrix performance of ATLAS, but it is easier to compile and includes binaries for 32-bit platforms.
Since it was released some of the files on the Netlib site changed, and QML 0.2.1 no longer compiles with them. I'm hosting the original files here. Place them in the download subdirectory before running make.
clapack.tgz (6.89 MB) f2c.tar.gz (240.00 KB) |
I am working on a new version of QML.
The interface will be the same as in version 0.3, including the recently added mm, ms, mls and mlsq functions, while the configuration will be more like version 0.2, with BLAS used by default and precompiled binaries provided.
It will be compatible with kdb+ 3.0, testing will be more thorough than ever, and there will be an option to link against a different implementation of BLAS, such as ATLAS.
In the mean time, check out this patch from Kim Tang for kdb+ 3.0 support.
Also, if you want to build QML 0.3.10, the particular development snapshot of ATLAS that it uses is no longer available for download, so I'm hosting it here.
atlas-3.9.46.tar.bz2 (5.31 MB) |
Just verified that it still builds on 32-bit Debian.
Welcome qml 0.4. Development continues on GitHub.
The latest qml 0.5 wraps up some work I started over three years ago, right after releasing qml 0.2.1. The idea is for the optimization routines .qml.min and .qml.conmin to become a single uniform interface to different underlying libraries such as CONMAX and NLopt.
NLopt provides qml with derivative-free algorithms for unconstrained and constrained optimization. While CONMAX, the only algorithm available previously, also works without explicitly specified gradient functions, it has to approximate them with central differences.
The main function of the LAPACK wrappers in qml is to convert matrices between q format (list of row vectors) and Fortran format (column-major array). This conversion can be viewed as a combination of a memory copy and a matrix transpose, which is slower than just a linear memory copy and can be tricky to optimize. A typical optimization is to use a blocked algorithm (4 nested loops instead of 2), but the optimal block size depends on the system architecture and the algorithm introduces extra complexity and corner cases.
This conversion doesn't account for most of the runtime of qml functions – the actual LAPACK operation does. For example, in .qml.mm the transpose is O(n²) while the multiplication is O(n³). But it does add substantial overhead.
However, it turns out that in many cases the transpose is unnecessary. We still need to convert between q format (non-contiguous vectors) and Fortran format (contiguous array), but we can do just a memory copy without a transpose and fold the transpose into the actual LAPACK call:
in .qml.mdet transpose doesn't matter as |A|=|Aᵀ|
in .qml.minv the input and output transpose can be omitted together as A⁻¹=((Aᵀ)⁻¹)ᵀ
in .qml.mm we can omit the input and output transpose if we swap the arguments, as AB=(BᵀAᵀ)ᵀ
Here's a graph showing the effectiveness of this optimization (ratio of qml 0.7 run time to qml 0.6 run time) at different matrix sizes when linked against OpenBLAS:
Matrix multiplication is the fastest of these operations and involves the largest number of matrices (2 inputs and 1 output), and OpenBLAS is faster than Netlib BLAS, so reducing the overhead makes the most difference to .qml.mm using OpenBLAS: it's 10-60% faster in qml 0.7.