Is there an overcomplete equivalent of generating a random orthonormal basis set?

Question

Is there an overcomplete equivalent of generating a random orthonormal basis set?

620 Views Asked by Bumbble Comm At 27 Mar 2026 - 10:02

I want to initialize a non-square matrix with bases that are random but as different as possible in the input space to generate a random over-complete basis set.

If the matrix was square I could generate a random orthonomal basis set. However I have more rows than columns and so I would like to initialize this matrix randomly so as to generate a nice spread of bases which are not all orthogonal, but generally maximizing the angle between them all.

For example with a 2D input space and a 2D output space ($2\times 2$ matrix) one might have a basis vector along the $x$ axis and one along the $y$ axis, but if three bases were needed for a $3\times 2$ matrix then the vectors could be arranged at 120 degrees from each other so that the output values were as uncorrelated as possible given the over-complete representation.

I guess one could imagine adding bases one by one and repeatedly adjusting the set to spread as far apart as possible. I'm not sure how to do this though. It seems like a bit of a tricky iterative optimization problem with some kind of energy function that measures proximity of the basis. Ideas?

Whats interesting is that the optimal solution for this is probably a fixed geometric basis structure for a given matrix size, and the random differences would just be rotations in the input space.

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2016-08-04 23:09:56

I figured out how to do it. The method I use is to do stochastic gradient descent on the loss given by the maximum dot product between all pairs of basis vectors, thus maximizing the angle between all of them.

Here's the code:

import numpy
import math

def generate_basis_set(rows, cols, thresh=-0.999, maxiters=0):
    """
    Generate a random basis set matrix with size (rows,cols) where all rows are unit length.
    The number of rows and columns can have any value. The row basis vectors are optimized so that
    they are maximally spread out in the input space so as to minimize the maximum dot product
    across all pairs of basis vectors. Note that this algorithm produces pairs of bases that are
    the same except for a change of sign (180 degrees apart), which is typically what is desired
    when initializing a RELU neural net layer since this gives positive and negative contrast units.
    Parameter "thresh" is the cosine distance where cos(theta) = thresh and provides an early exit. So for
    example to exit as soon as no vectors are closer than 60 degrees apart set thresh=cos(pi/3).
    Parameter "maxiters" is the maximum number of allowed iterations, where maxiters=0 leads to automatic
    behavior.
    The return value is a tuple consisting of the weight matrix and the maximum value of the cosine
    distance (range -1.0 to 1.0).
    """

    print("Generating random matrix")
    m = numpy.random.randn(rows,cols)

    print("Normalizing")
    rowmag = numpy.sqrt((m**2).sum(axis=1))
    m /= rowmag.reshape(rows,1)

    print("Forming %d x %d dot product matrix" % (rows, rows))
    c = numpy.empty((rows,rows))
    for i in range(rows):
        for j in range(i,rows):
            if i!=j:
                c[i,j] = numpy.dot(m[i,:],m[j,:])
                c[j,i] = -2.0
            else:
                c[i,j] = -2.0

    alpha = 0.1
    avemaxslow = 1.0
    avemaxfast = 1.0
    kslow = 0.01
    kfast = 0.1
    iters = 0
    change_iters = 0

    while alpha>0.001:
        # get location of max value of cos distance
        ind1, ind2 = numpy.unravel_index(c.argmax(), c.shape)
        maxval = c[ind1,ind2]
        if maxval<thresh:
            break

        print("%d: alpha=%f\tind1=%4d ind2=%4d\tcos_dist = %f" % (iters,alpha,ind1,ind2,maxval))

        # update weights
        w1 = m[ind1,:]
        w2 = m[ind2,:]
        m[ind1,:] -= alpha * w2
        m[ind2,:] -= alpha * w1

        # renormalize - divisor should never be zero if alpha is <<1
        norm = math.sqrt((m[ind1,:]**2).sum())
        m[ind1,:] /= norm
        norm = math.sqrt((m[ind2,:]**2).sum())
        m[ind2,:] /= norm

        # update cos distances
        for i in range(rows):
            if i<ind1:
                c[i,ind1] = numpy.dot(m[i,:],m[ind1,:])
            elif i>ind1:
                c[ind1,i] = numpy.dot(m[ind1,:],m[i,:])
            if i<ind2:
                c[i,ind2] = numpy.dot(m[i,:],m[ind2,:])
            elif i>ind2:
                c[ind2,i] = numpy.dot(m[ind2,:],m[i,:])

        # update moving averages
        if iters==0:
            avemaxslow = maxval
            avemaxfast = maxval
        else:
            avemaxslow += kslow * (maxval - avemaxslow)
            avemaxfast += kfast * (maxval - avemaxfast)
        deltamax = avemaxslow - avemaxfast

        # update learning rate
        if iters>change_iters+100 and deltamax < 0.0001:
            alpha *= 0.9
            change_iters = iters

        iters += 1
        if maxiters>0 and iters>maxiters:
            break

    return m, c.max()

rows = 100
cols = 16

weights, maxcosdist = generate_basis_set(rows,cols)

print(weights, maxcosdist)

**Bumbble Comm** · Answer 2 · 2016-08-04 23:33:03

Here is a new try.

$$\min_{\bf M_1, M_2}\{\|{\bf M_1}^T{\bf M_2}-{\bf I}\|_? + \epsilon\|\bf M_1-M_2\|_2\}$$

We initialize them to random matrices and then solve alternatingly for $\bf M_1$ and $\bf M_2$. The scalar products in the ${\bf M_1} ^T {\bf M_2}$ should be cos-squares.

EDIT For 2-norm it seems to sometimes do something along the lines of what I think you want. But it is by no means robust or anything; Anything below 6 points in 2D seems rather stable, like this like sided pentagon we get for 5 points using 6 passes of solving for ${\bf M}_1, {\bf M}_2$. Above 5 points we will probably need to add something to get it working. Maybe aim for other norms, allowing restarts to the iterations or something third.

Old solution (probably not working as far as I can tell)

I have not taken a look at your solution, but you could start by Gram-Schmidt until you span a basis, stuff the vectors as columns into $\bf M$ and then you can do

$${\bf v}_{new} = \min_{\bf v}\left\{{\|{\bf v}^T{\bf M}\|_2} + \epsilon \|{\bf v}_1-k\|_2\right\}$$ with second term only being there to avoid getting zero vector solution and then expand ${\bf M} = [{\bf M,v}_{new}]$, after having normalized ${\bf v}_{new}$ ( of course ).

This will (hopefully) add a vector which is least similar in scalar product sense to the previous vectors in some sense. Of course you could add some scalar product matrix in between if you want to ${\bf v}^T {\bf G M}$, with $\bf G$ being a Gram matrix.

Is there an overcomplete equivalent of generating a random orthonormal basis set?

There are 2 best solutions below

Related Questions in MATRICES

Related Questions in RANDOM

Related Questions in ORTHONORMAL

Trending Questions

Popular # Hahtags

Popular Questions