In the paper paper-montgomery-multiplication there are a lot of algorithm explained how to make a montgomery multiplication on bit level. But I have problems to understand that correctly.
I have written Algorithm 1 (page 2 in paper) in Verilog (just testbench).
module mmul();
parameter k = 6;
reg [7:0] a;
reg [7:0] b;
reg [7:0] s;
reg [7:0] q;
reg [7:0] n;
integer i;
initial
begin
a = 13;
b = 17;
n = 41;
s = 0;
for(i=0; i<k; i = i + 1)
begin
q[i] = (s+a[i]*b)%2;
s = (s+a[i]+b+q[i]*n)/2;
end
if(s>=n)
s = s-n;
$display("%d",s);
$finish();
end
endmodule
I expect that 13*17 % 41 results 16 because 13*17 % 41 = 221 % 17 -> 221 - floor(221/41)*41 = 16
This program results with 23. I know that this algorithm S= A×B*2^(-k) mod n. What I have to do, so that the program delivers the right result?