Computer consist of hardware, literally hard wiring machine language instructions (operations) and operands to predetermined results. At the lowest level of interest, this hard wiring is physical implementation of boolean algebra or some other discrete mathematical system, implemented with electronic logic gates.
At the most basic level, a logic gate takes one or two binary inputs (signals), and return a binary output in respect to one of these logical operations: AND, OR and NOT. Any truth table of arbitrary binary inputs and outputs can be reduced to canonical boolean expression composed of these logical operators. I.e any chip performing some logical operation on arbitrary binary data can be composed of these logic gates. Machine instruction are binary control bits to parameterize the chip behavior.
From these elementary logic gates, higher level of abstraction is derived: Adders, ALU, multiplexers, registers, CPU, memory etc. At this level, computer performs arithmetic operations on hardware sequentially, moving the binary results back and forth between ALU, registers and memory, synchronized by the clock signal. These physical transformations (arithmetics) of binary data logically obeys the mathematical axioms on which the logic gates are based.
Software is the ultimate abstraction over hardware. With it, computer can be instructed to perform anything. Software can be defined using any syntax, and it is completely decoupled from the hardware, until compiled. Even then, it only depends on the binary code, which is abstract interface to hardware. It never meets the hardware. Neither knows the existence of the other. But ultimately, any software is reduced (and limited) to the sequence of arithmetic or logical operations performed by the hardware.
My question is, what is the minimal design of computer circuitry and architecture (bit width, ALU operations, registers, instruction set etc.) to be able to theoretically compile and run any software which modern PC can run (OS and applications) on the computer, when performance isn't considered? And how do you mathematically proof this?
Here is the question in the nutshell: What is the minimal required instruction set for hardware to be able to emulate a modern PC?
This is a quite deep question, and has some problems:
It seems like you want to minimize the bits of your CPU registers. Unfortunately, the number of bits is logarithmically proportional to how much memory you have access to. Since a program might need any amount of memory, in fact, no number of bits would be enough to run any program, even if you had an infinite amount of memory.
So you see, even asking the question correctly might be difficult and the answer can vary depending on how you do that. Automata Theory, and theoretical computer science in general, deal with many of these kinds of questions. I am afraid unless you are more specific with what you want to optimize under which conditions, it is hard to give a formal answer.