Adaptive feedforward cancellation (AFC) and least mean squares (LMS) for periodic disturbance cancellation

98 Views Asked by At

I want to implement an adaptive feedforward cancellation (AFC) for cancellation of the impact of periodic input disturbances on the output of a multiple-input single-output system. Filter weights are adapted using an LMS algorithm.

Unfortunately I have problems understanding the details of the algorithm and I could not find any documents that give me concrete answers to my questions. In order to ask my questions here, I want to explain briefly what I know and what I did until now:

In my setup there are two control inputs and one output. Furthermore, there are two sinusoidal disturbances with known frequency, but unknown amplitude and phase shift. One of those disturbances acts on each input. The basic structure of the control looks as shown here: Block diagram of the control including the LMS

$S$ is the secondary path (plant), which is an LTI system and which has the complex transfer function matrix $$ \underline{\boldsymbol{S}}(j \omega) = \begin{bmatrix} \underline{\boldsymbol{S}}_1(j \omega) & \underline{\boldsymbol{S}}_2(j \omega) \\ \end{bmatrix} $$

Rewritten as real-value matrix, this is

$$ \boldsymbol{S} = \begin{bmatrix} \text{Re}\{ \underline{\boldsymbol{S}}_1(j \omega) \} & \text{Im}\{ \underline{\boldsymbol{S}}_1(j \omega) \} & \text{Re}\{ \underline{\boldsymbol{S}}_2(j \omega) \} & \text{Im}\{ \underline{\boldsymbol{S}}_2(j \omega) \} \\ -\text{Im}\{ \underline{\boldsymbol{S}}_1(j \omega) \} & \text{Re}\{ \underline{\boldsymbol{S}}_1(j \omega) \} & -\text{Im}\{ \underline{\boldsymbol{S}}_2(j \omega) \} & \text{Re}\{ \underline{\boldsymbol{S}}_2(j \omega) \} \\ \end{bmatrix} $$

$\boldsymbol{d}$ are the periodic input disturbances with $$ \boldsymbol{d}(t) = \begin{bmatrix} w_{d,1,c} \, \cos(\omega t) + w_{d,1,s} \, \sin(\omega t)\\ w_{d,2,c} \, \cos(\omega t) + w_{d,2,s} \, \sin(\omega t)\\ \end{bmatrix} = \boldsymbol{X} \, \boldsymbol{w}_d $$

with

$$ \boldsymbol{x} = \begin{bmatrix} \cos(\omega t) & \sin(\omega t)\\ \end{bmatrix}^\top $$

and

$$ \boldsymbol{X} = \boldsymbol{I}_{\text{2x2}} \otimes \boldsymbol{x}^\top $$

and with the vector $\boldsymbol{w}_d$ containing the unknown coefficients of the harmonic disturbances:

$$ \boldsymbol{w}_d = \begin{bmatrix} w_{d,1,c} & w_{d,1,s} & w_{d,2,c} & w_{d,2,s} \end{bmatrix}^\top $$

As said, the target is to cancel the impact of $\boldsymbol{d}$ on the output $\alpha$ of the plant. Considering only the steady-state, this is

$$ \alpha_d = \boldsymbol{S} \, d $$

respectively

$$ \alpha_d = \boldsymbol{x}^\top \, \boldsymbol{S} \, \boldsymbol{w}_d $$

Since the disturbances contain only a single frequency component, a narrow-band filter with input signals $\boldsymbol{x}$, weights $\boldsymbol{w}$ and output $(\boldsymbol{X} \, \boldsymbol{w} )$ is used fo cancelling $\boldsymbol{d}$.

With compensation, the output of the plant becomes

$$ \alpha = \boldsymbol{x}^\top \, \boldsymbol{S} \, ( \boldsymbol{w}_d + \boldsymbol{w} ) $$

The optimal filter weigths should be found when the cost function $$ C = E\{\alpha^2(t)\} \approx \alpha^2(t) $$ is minimized.

To achieve this, the filter weights are updated in direction of the negative gradient (steepest descent): $$ - \nabla C = - \frac{\partial \, C}{\partial \, \boldsymbol{w}} $$

where $\nabla C$ is

$$ \begin{aligned} \nabla C &= 2 \, \alpha^\top \, ( \nabla \alpha ) \\ &= 2 \, \alpha^\top \, \boldsymbol{x}^\top \, \boldsymbol{S} \\ \end{aligned} $$

Since $\boldsymbol{w}$ and its derivative $\dot{\boldsymbol{w}}$ are row vectors, the weights are updated using

$$ \dot{\boldsymbol{w}}^\top = - \mu \, \nabla C \\ $$

A factor $\mu = \text{const.}$ was introduced to adapt the step-size. By rearranging, this becomes

$$ \begin{aligned} \dot{\boldsymbol{w}} &= - \mu \, (\nabla C)^\top \\ &= - 2 \, \mu \, ( \, \alpha^\top \, \boldsymbol{x}^\top \, \boldsymbol{S} )^\top \\ &= - 2 \, \mu \, \boldsymbol{S}^\top \, \boldsymbol{x} \, \alpha \end{aligned} $$

First question: There is the so called Filtered x-LMS algorithm (FXLMS). Using this algorithm, one filters the input signal $\boldsymbol{x}$ using an model of $\boldsymbol{S}$. In my algorithm, $\boldsymbol{x}$ is filtered by $\boldsymbol{S}^\top$. Otherwise, they are same I think. The FXLMS is discussed very often in literature. However, the algorithm that I'm refering to is found rarely. Why? At the moment I can't see any advantages or disadvantages if I compare them.

Further on, I want to discuss the convergence rate of the algorithm. The previously explained algorithm shows a frequence dependent convergence rate, as can be seen in following plot of my exemplary implementation: Convergence behavior

However, there are documents claiming that convergence rate is frequency independent, if the derivative of the weights is calculated as

$$ \dot{\boldsymbol{w}} = - 2 \, \mu \, \boldsymbol{S}^{-1} \, \boldsymbol{x} \, \alpha $$

or as

$$ \dot{\boldsymbol{w}} = - 2 \, \mu \, \boldsymbol{S}^+ \, \boldsymbol{x} \, \alpha $$

Again, I want to provide a plot which shows that this is true, at least for my test: Convergence behavior

Unfortunately, I could only find few documents describing this algorithm and none of them answered following qeustion: (or at least I could not find the answer)

Second question: As far as is understood until now, the calculation rule using the inverse or Moore-Penrose pseudoinverse results from a normalization of $\boldsymbol{S}$. Is this correct? Why and in which cases is it useful to use $\boldsymbol{S}^{-1}$ or $\boldsymbol{S}^+$ here? How can I derive the rule to calculate $\dot{\boldsymbol{w}}$ using $\boldsymbol{S}^{-1}$ or $\boldsymbol{S}^+$.

Update: I think I made some progress regarding the second question.

If one substitutes $\alpha$ in

$$ \dot{\boldsymbol{w}} = - 2 \, \mu \, \boldsymbol{S}^\top \, \boldsymbol{x} \, \alpha $$

this becomes

$$ \begin{aligned} \dot{\boldsymbol{w}} &= - \underbrace{ 2 \, \mu \, \boldsymbol{S}^\top \, \boldsymbol{x} \, \boldsymbol{x}^\top \, \boldsymbol{S} }_{\boldsymbol{K}} \, (\boldsymbol{w}_d + \boldsymbol{w}) \\ &= - \boldsymbol{K} \, \boldsymbol{w}_d - \boldsymbol{K} \, \boldsymbol{w} \end{aligned} $$

This shows that $\boldsymbol{K}$ can be interpreted as gain of the adaption. The adaption will have a frequency-independent convergence rate if the euclidean norm of the product $(\boldsymbol{S}^\top \, \boldsymbol{x} \, \boldsymbol{x}^\top \, \boldsymbol{S})$ is equal to 1.

This is achieved by dividing by the actual euclidean norm, which is

$$ || \boldsymbol{S}^\top \, \boldsymbol{x} \, \boldsymbol{x}^\top \, \boldsymbol{S} ||_2 = \frac{1}{ \text{Re}\{ \underline{\boldsymbol{S}}_1 \}^2 + \text{Im}\{ \underline{\boldsymbol{S}}_1 \}^2 + \text{Re}\{ \underline{\boldsymbol{S}}_2 \}^2 + \text{Im}\{ \underline{\boldsymbol{S}}_2 \}^2 } $$

Furthermore it applies (in this case?):

$$ \frac{1}{ \text{Re}\{ \underline{\boldsymbol{S}}_1 \}^2 + \text{Im}\{ \underline{\boldsymbol{S}}_1 \}^2 + \text{Re}\{ \underline{\boldsymbol{S}}_2 \}^2 + \text{Im}\{ \underline{\boldsymbol{S}}_2 \}^2 } \boldsymbol{S}^\top = \boldsymbol{S}^+ $$

Therewith it follows

$$ \begin{aligned} \dot{\boldsymbol{w}} &= - 2 \, \mu \, \frac{1}{ \text{Re}\{ \underline{\boldsymbol{S}}_1 \}^2 + \text{Im}\{ \underline{\boldsymbol{S}}_1 \}^2 + \text{Re}\{ \underline{\boldsymbol{S}}_2 \}^2 + \text{Im}\{ \underline{\boldsymbol{S}}_2 \}^2 } \, \boldsymbol{S}^\top \, \boldsymbol{x} \, \boldsymbol{x}^\top \, \boldsymbol{S} \, (\boldsymbol{w}_d + \boldsymbol{w}) \\ &= - 2 \, \mu \, \boldsymbol{S}^+ \, \boldsymbol{x} \, \boldsymbol{x}^\top \, \boldsymbol{S} \, (\boldsymbol{w}_d + \boldsymbol{w}) \end{aligned} $$

However, I don't trust myself... so I would be grateful if anyone could tell me, whether all the stuff I wrote is (mathematically) correct.

Furthermore, I did not fully understand how I can show, that frequency-independent convergence rate can be achieved by dividing by the euclidean norm.

And finally, I'm still missing an answer to my first question.

Note: I asked the same question on Engineering Stackexchange.