I am working on the following exercise, where I am not sure if the claim actually holds since I may have found a (simple) counter example. Maybe I am missing something?
Let $T \colon X \to X$ be a continuous transformation on a compact metric space $(X, d)$. For any open cover $\beta$, let $N(\beta)$ be the number of sets in a finite subcover of $\beta$ of minimal cardinality. Further, for any subset $A \subseteq X$ we write $N(A, \beta)$ for the minimal cardinality that any subset of $\beta$ covering $A$ can have.
Fix two open covers $\alpha$ and $\beta$ of $X$. Prove that $$ N\left(\bigvee_{i = 0}^{n-1}T^{-i}\beta\right) \leq N\left(\bigvee_{i = 0}^{n-1}T^{-i}\alpha\right) + \max_{A \in \bigvee_{i = 0}^{n-1}T^{-i}\alpha} N \left(A, \bigvee_{i = 0}^{n-1}T^{-i}\beta\right), $$ where $\bigvee_{i = 0}^{n-1}T^{-i}\alpha = \{A_0 \cap T^{-1}A_1 \cap \ldots \cap T^{-(n-1)}A_{n-1} \mid A_i \in \alpha\}$.
If I take $X = [0,1]^2, T = \operatorname{id}, n = 1$ and let $\alpha$ be an open cover obtained by cutting $X$ into 4 pieces (with some overlap, so we have open sets and a cover) and $\beta$ be an open cover obtained by cutting $X$ into 9 pieces (see drawing), then I would get
$$ 9 = N(\beta) \leq N(\alpha) + \max_{A \in \alpha}N(A, \beta) = 4+4, $$ since I can cover any set $A \in \alpha$ with 4 elements of $\beta$. Am I missing something?

You are right; there is a typo. Either the sum on the RHS should have been a product, or logarithms of the $N$'s should have been taken.
The statement is a more general counting statement. Let us first set up some notation. Let $X$ be a nonempty set, $\mathcal{P}(X)$ be its powerset, $\mathfrak{P}(X)\subseteq \mathcal{P}^2(X)=\mathcal{P}(\mathcal{P}(X))$ be the collection of all covers of $X$ that admit a finite subcover. Define the functions
\begin{align*} N(\,\bullet\,;\,\bullet\,)&:\mathcal{P}(X)\times \mathfrak{P}(X)\to\mathbb{Z}_{\geq1}, (A,\mathcal{U})\mapsto \min\left\{\#(\mathcal{U}')\,\left|\, \mathcal{U}'\subseteq \mathcal{U},\, A\subseteq \bigcup \mathcal{U}'\right\}\right.,\\ N(\,\bullet\,|\,\bullet\,)&:\mathfrak{P}(X)\times \mathfrak{P}(X)\to\mathbb{Z}_{\geq1}\cup\{\infty\}, (\mathcal{U},\mathcal{V})\mapsto \max_{V\in\mathcal{V}}N(V;\mathcal{U}),\\ N&: \mathfrak{P}(X)\to\mathbb{Z}_{\geq1}, \mathcal{U}\mapsto N(\mathcal{U}|\{X\})=N(X;\mathcal{U}). \end{align*}
(Let's also put $N(\emptyset;\mathcal{U})=1$.)
Note that the arguments flip from the first definition to the second definition. As far as I'm aware at least the second function is due to Misiurewicz (see his paper "Topological Conditional Entropy" for more). One can think of the logarithms $\log N(A;\mathcal{U})$ and $\log N(\mathcal{U}|\mathcal{V})$ of these two functions as the entropy of the cover $\mathcal{U}$ concentrated on the set $A$ and the entropy of the cover $\mathcal{U}$ conditioned on $\mathcal{V}$, respectively.
Heuristically the larger $N(V;\mathcal{U})$ is the more surprising the knowledge of the set $V$ is according to the knowledge of (or resolution provided by) $\mathcal{U}$; likewise the larger $N(\mathcal{U}|\mathcal{V})$ the more surprising the knowledge of $\mathcal{U}$ is, given the knowledge of $\mathcal{V}$. Note that the definition $N(\mathcal{U}|\mathcal{V})=\max_{V\in\mathcal{V}} N(V;\mathcal{U})$ is justified with this heuristic: how surprising the knowledge of $\mathcal{U}$ is given $\mathcal{V}$ is precisely how surprising each cell $V$ of $\mathcal{V}$ is according to the knowledge of $\mathcal{U}$. (Many algebraic inequalities regarding entropy can be interpreted this way; see also Tao's blog post "Some notes on group extensions" (https://terrytao.wordpress.com/2010/01/23/some-notes-on-group-extensions/) for a use of digital imaging analogies in another cohomological context. Of course it should be noted that the temporal tenses of these heuristics can be confusing; perhaps "becomes" or "ends up being" is better to use instead of "is".)
Using the above formalism, the (corrected version of the) claim in question is as follows:
Claim: $\forall \mathcal{U},\mathcal{V}\in\mathfrak{P}(X): N(\mathcal{U})\leq N(\mathcal{V})\,N(\mathcal{U}|\mathcal{V})$. (Heuristic: it is possibly more surprising to first acquire the knowledge of $\mathcal{V}$ and then, while retaining the knowledge of $\mathcal{V}$, the knowledge of $\mathcal{U}$, than to acquire the knowledge of $\mathcal{U}$ from the get-go.)
Proof: Let $\mathcal{V}'\subseteq \mathcal{V}$ be such that $X\subseteq \bigcup\mathcal{V}'$ and $N(\mathcal{V})=\#(\mathcal{V}')$. Then for any $V'\in\mathcal{V}'$, there is a $\mathcal{U}_{V'}\subseteq \mathcal{U}$ such that $V'\subseteq \bigcup \mathcal{U}_{V'}$ and $\#(\mathcal{U}_{V'})=N(V';\mathcal{U})\leq N(\mathcal{U}|\mathcal{V})$. Now define a new collection $\mathcal{W}$ by
$$\mathcal{W}=\{V'\cap U\,|\, V'\in\mathcal{V}', U\in\mathcal{U}_{V'}\}.$$
Then $\mathcal{W}\in\mathfrak{P}(X)$ since
$$\bigcup \mathcal{W} = \bigcup_{V'\in\mathcal{V}'}\bigcup_{U\in\mathcal{U}_{V'}}V'\cap U = \bigcup_{V'\in\mathcal{V}'} V'\cap\left(\bigcup\mathcal{U}_{V'}\right)= \bigcup_{V'\in\mathcal{V}'} V'\supseteq X,$$
and since $\mathcal{W}\subseteq \mathcal{V}\sqcap \mathcal{U}$ ($= \mathcal{V}\vee\mathcal{U}$ in your notation) it has finite subcovers. Finally
$$N(\mathcal{U})\stackrel{(\ast)}{\leq} N(\mathcal{U}\sqcap\mathcal{V})\leq \#(\mathcal{W})\leq \#(\mathcal{V}') \, \max_{V'\in\mathcal{V}'}\#(\mathcal{U}_{V'})\leq N(\mathcal{V})\, N(\mathcal{U}|\mathcal{V}).$$
($\ast$) Heuristic: $\mathcal{U}\sqcap\mathcal{V}$ possibly has more cells than $\mathcal{U}$, so it is more surprising.