Dynamics of Loss in Homogeneous, Non-Smooth Models Using Clarke Subdifferential

53 Views Asked by At

tl;dr: Seeking insights on the application of Clarke subdifferential for analyzing the optimization differential inclusion with smooth objective and homogeneous model. I'm interested in its validity, especially considering non-standard descent directions. This question was orchestrated by GPT4.

I'm studying the dynamics of loss over time in a class of homogeneous, non-smooth models using Clarke subdifferential. This approach is interesting as it extends the gradient to include such non-smooth functions. Below are the key concepts and my question:

Clarke Subdifferential Definition: The Clarke subdifferential of a function $\mathcal{L} : \mathbb{R}^d \to \mathbb{R}$ at a point $x$ is given by: $$ \partial^{\circ} \mathcal{L}(\theta) := \text{conv} \left \lbrace \lim_{i \to \infty} \nabla \mathcal{L}(\theta_i) : \mathcal{L} \text{ differentiable at } \theta_i \land \lim_{i \to \infty} \theta_i \to \theta \right \rbrace . $$ If $\mathcal{L}$ is continuously differentiable at $\theta$, then $\partial^{\circ} \mathcal{L}(\theta) = \lbrace \nabla \mathcal{L}(\theta) \rbrace$.

Flow Dynamics and Differential Inclusion: In an optimization process modeled as a flow, the parameter trajectory $\lbrace \theta^{(t)} | t \ge 0 \rbrace$ adheres to a differential inclusion $D$ defined from $\tilde \nabla$ intended to reference some association with the gradient. So we have adherence to $D$ for almost all $t \ge 0$: $$ \dot \theta^{(t)} \in - D . $$

Challenges with Non-Standard Descent Directions: The non-standard descent directions in these models complicate the analysis of loss dynamics. To approach this, I aim to bound the rate of loss decrease using the function $\alpha(t)$, defined as: $$ \alpha(t) := \text{max} \left \lbrace \partial^{\circ} (\mathcal{L} \circ \theta)'(t) \right \rbrace \\ = -\text{min} \left \lbrace \lim_{i \to \infty} \langle \nabla \mathcal{L}(\theta_i), \tilde \nabla \mathcal{L}(\theta_i) \rangle : \mathcal{L} \text{ differentiable at } \theta_i \land \lim_{i \to \infty} \theta_i \to \theta \right \rbrace . $$

My Questions:

  1. Is this a legitimate way to bound the Clarke subdifferential? I can assume that the set over which $\nabla \mathcal{L}$ and $\tilde \nabla \mathcal{L}$ are well-defined overlap in various ways if that helps.
  2. Is there any generalized minimum-norm path principle that simplifies the subdifferential to a "small" set as is the case with gradient flow?