Consider the following model. There is a DAG $D_0$ whose p nodes correspond to random variables $X_1,...,X_p$: assume that
$$X_1,...,X_p \sim N_p (0, \Sigma_0) \text{ with density } f_{ \Sigma_0} (\cdot),$$
$$N_p (0, \Sigma_0) \text{ is Markovian with respect to } D_0,$$
where the Markov property can be understood as the factorization property where the joint Gaussian density $f_{ \Sigma_0} (x_1,...,x_p) = \prod_{j=1}^p f_{ \Sigma_0} (x_j | x_{pa(j)})$, with pa(j) denoting the set of parents of node j.
A Gaussian DAG model in the above can always be equivalently represented as a linear structural equation model,
$$X_j = \sum\limits_{k \in \text{pa}(j)} \beta_{kj} X_k + \epsilon_j \qquad ( j=1,...,p),$$
where $\epsilon_1,..., \epsilon_n$ are independent, $\epsilon_j \sim N(0, |\omega_j|^2)$ and $\epsilon_j$ independent of $\{ X_k; k \in \text{pa}(j) \}$; note that $\text{pa}(j) = \text{pa}_{D_0}(j)$ depends on the true DAG $D_0$. Let
$$\Sigma_n := X^T X/n$$
be the empirical covariance matrix based on the observations X. Given a $p \times p$ nonsingular covariance matrix $\Sigma$, with inverse $\Theta := \Sigma^{-1}$, the minus log-likelihood is proportional to
$$l_n (\Theta):= \text{trace}(\Theta \Sigma_n) - \log \det (\Theta).$$ Note that $\Theta_0 := \Sigma_0^{-1} $ is the overall minimizer of $l(\Theta)$.
Why this is the minimizer?