Per the advice in this comment, I am submitting this redraft of this question as a new question. This new question not only incorporates the five edits of the original question but also incorporates two more changes, identified with bold font to identify them.
I'll ask the question informally first and then try to work up to something more formal.
Consider a linear search problem. We are presented with an unsorted array of numbers and are asked to determine whether a certain number is in the array. There seems to be no better algorithm than to search linearly through the array, testing each array element in turn to see if it's equal to the test number. The point here is that if we do find it, we don't have to search any further.
The question is whether this is more generally true. For example, if we are asked to find a clique of size $k$ in a graph, we can quit once we find one, at least if we use the naive algorithm which simply checks each combination of vertices of size $k$. Does the optimal algorithm also halt when it has read enough of the input?
Let's try to formalize this. The model of computation we will use here is a Turing machine, with one tape and a fixed alphabet where the Turing machine alphabet $\Sigma = \{0,1\}$. This is to address potential applications of the linear speedup theorem. An input $i \in \{0,1\}^n$ is also just $n$ bits. Let's say fixed algorithms $A$ and $B$ compute a problem $P$, with $p$ an instance of $P$. Define $A$ as optimal if there is no $B$ with better time performance as defined below. We assume worst-case analysis until further clarifications are necessary.
New paragraph: for each problem $P$, we allow the input to be presented in the optimal order, that is, whatever order provides the best performance as defined below. This order is fixed, however, across algorithms implementing solutions to problem $P$.
Here's a formalization of the time performance optimality measure; it's modeled on big O but different. If $f$ and $g$ are two functions defined on the natural numbers, then we'll say $f(n) = \Phi(g(n))$ if there is a constant $C$ such that
$$ f(n) < g(n) \textrm{ for all } n > C$$
Let's put some other constraints on $P$. Some instances $p$ can be solved by looking at a prefix $q$ of $i$, that is, $i=qw$ for some non-empty $w$. Will the optimal $A$ stop as soon as it has read and computed on $q$, or must it read $w$ as well?
Here's an alternative phrasing which describes the counterexample I'm looking for. The counterexample would exhibit a specific $P,A,p,i,q$ such that $A$ is an optimal algorithm for $P$ and $p$ is a specific instance describable with $i$ bits where even though $p$ can be solved based on $q$, algorithm $A$ nonetheless necessarily reads all of $i$ to solve its problem.