I am trying to understand how to extremize a function $S: C^\infty(\mathbb{R}\to\mathbb{R}^n)\to\mathbb{R}$, which takes as input a function $x(t)$ and gives as output a scalar, on a constraint $g:\mathbb{R}^n\to\mathbb{R}^m$ regarding $x(t)$, meaning the constraints are constraining the $x(t)$, which thus reduces the domain of $S$. However, I am not really sure if $S$ is mapping from a submanifold, since I was not able to express the domain as the kernel of a function.
I tried setting $\frac d{d\epsilon}S\stackrel{!}=0$, where we define $x=x_0+\epsilon\eta$ and $x_0$ is the function over which $S$ is extremal and therefore for $\epsilon=0$ $\frac d {d\epsilon}S$ indeed becomes $0$, but I don't know how to incorporate the constraints on $x(t)$ using this method.
Since the constraints are affecting $x(t)$ and not $S$, I refrained from directly using the method of Lagrange multipliers: $$\nabla S=\displaystyle\sum_i\lambda\nabla g$$
which is I think not possible, since $S$ is depdendant on only one variable, therefore the left hand side is a scalar, whereas the right hand side is a vector.
Any help is much appreciated!
In general, Lagrange multipliers should lie in the dual space of whatever the codomain of the constraint is, which leads to the Lagrangian $$L(x,\lambda) = S(x) + (\lambda, g(x)),$$ where $(~,~)$ denotes the duality pairing. Often, this is an inner product, such as in the case where $g$ takes values in $\mathbb{R}^m$. In this case, $\lambda$ can be interpreted as an $m$-vector: $\boldsymbol{\lambda} = (\lambda_1,\dots,\lambda_m)^\top$ and the objective can be written as $L(x) = S(x) + \boldsymbol{\lambda}^\top g(x).$ Your formulation of the constraint is a bit odd, as it is unclear how it relates to $x$. $g$ is a mapping from $\mathbb{R}^n$ and $x$ is a mapping to $\mathbb{R}^n$, so does it act pointwise on a single value of $x(t)$? If we interpret $g$ as a mapping from the function space to $\mathbb{R}^m$, then the gradient of the Lagrangian w.r.t. $x$ is well-posed, as we would have $$\nabla_x L = \nabla_x S + \boldsymbol{\lambda}^\top\nabla_xg.$$ The key here is that $\left.\nabla_x g\right|_{x}$ is a linear map from the function space to $\mathbb{R}^m$, so the operator by $\boldsymbol{\lambda}^\top\nabla_x g$ is now an operator from the function space to $\mathbb{R}$, just like $\nabla_x S$, so we are good.
The key points are