I'll ask the question informally first and then try to work up to something more formal.
Consider a linear search problem. We are presented with an unsorted array of numbers and are asked to determine whether a certain number is in the array. There seems to be no better algorithm than to search linearly through the array, testing each array element in turn to see if it's equal to the test number. The point here is that if we do find it, we don't have to search any further.
The question is whether this is more generally true. For example, if we are asked to find a clique of size $k$ in a graph, we can quit once we find one, at least if we use the naive algorithm which simply checks each combination of vertices of size $k$. Does the optimal algorithm also halt when it has read enough of the input?
Let's try to formalize this. (EDIT: the model of computation we will use here is a Turing machine.) An input $i \in \{0,1\}^n$ is just $n$ bits. Let's say algorithms $A$ and $B$ (EDIT: $A$ and $B$ are fixed) compute a problem $P$, with $p$ an instance of $P$. Define $A$ as optimal if there is no $B$ with better time performance. (EDIT: we'll assume worst-case analysis until further clarifications are necessary.)
EDIT: Here's a formalization of the time performance optimality measure; it's modeled on big O but different. If $f$ and $g$ are two functions defined on the natural numbers, then we'll say $f(n) = \Phi(g(n))$ if there is a constant $C$ such that
$$ f(n) < g(n) \textrm{ for all } n > C$$
Let's put some other constraints on $P$. Some instances $p$ can be solved by looking at a prefix $q$ of $i$, that is, $i=qw$ for some non-empty $w$. Will $A$ stop as soon as it has read and computed on $q$, or must it read $w$ as well?
Here's an alternative phrasing which describes the counterexample I'm looking for. The counterexample would exhibit a specific $P,A,p,i,q$ such that $A$ is an optimal algorithm for $P$ and $p$ is a specific instance describable with $i$ bits where even though $p$ can be solved based on $q$, algorithm $A$ nonetheless necessarily reads all of $i$ to solve its problem.
With your clarified definitions I will still say that the answer is "vacuously yes", because optimal machines do not exist at all for any problems other than recognizing a regular language.
More precisely, if you have any machine $M$ that solves the problem, I can construct another machine that completes faster in the worst case for every sufficiently long input. Because even constant terms in the running time matters, we can adapt the linear speedup theorem to work without increasing the size of the tape alphabet:
My machine will work just like $M$, but it remembers the content of 10 squares on the original tape internally as part of its state; those 10 squares are not represented on the tape. Therefore it can do in a single step any part of the computation where $M$ stays within those $10$ squares. Only when $M$ moves out of the window to the left or the right does my machine need to make a step. For example if $M$ moves out of the window to the right, my machine will read a symbol on the tape, add it to the right end of its internal memory, while writing the leftmost symbol from its old state to the square that it just read. Then it moves in the direction where $M$ will next leave its new window.
(My machine starts knowing that the 10 squares to the immediate left of the input are blank. The infinite blank tape to the left may equally well represent itself shifted 10 squares, so I have no set-up costs).
Since $P$ is not regular, for long enough input $M$ has to do some back-and-forth movement on the tape in the worst case. And each time it changes direction, my machine gets to skip at least a few of $M$'s steps. So from a certain length onwards it is always at least a few steps quicker than $M$.
This speedup comes at a cost: in general I may need 1024 times as many states in my machine as $M$ has. As mentioned in comments, one might imagine cost models that penalize a machine for having many states, but that is certainly not the standard way to count efficiency for Turing machines, and is not specified in you definition of "optimality" how one would do that.
On the other hand, for a regular $P$ we can easily construct a machine that reads the tape at full speed, and stops with the right answer the instant it has read enough of the input that the answer can be known with certainty.