I have been reading the excellent paper "High-Speed Architectures for Reed–Solomon Decoders" by Dilip V. Sarwate, and Naresh R. Shanbhag. This paper presents among else a very optimized, in terms of both critical path and latency, algorithm that evaluates the coefficients of the Error Locator Polynomial for any Reed-Solomon code - named in the paper the RiBM algorithm. The RiBM algorithm has a critical path of one multiplication and one addition per cycle, for a total of 2T cycles.
The paper builds upon the "Inversionless Decoding of Binary BCH Codes" by Herbert O. Burton, which presents an algorithm (called iBM) that avoids the division in the Berlekamp-Massey iterative algorithm which produces the error locator polynomial for Binary BCH codes. The algorithm works in T cycles, but has a long critical path of multiple operations per cycle.
I need to decode a Binary BCH code, using something similar to the RiBM algorithm described in the first paper mentioned above, but with a T-cycle latency. The problem with using the RiBM algorithm as described in the paper above as is is that I would need to spend 2T cycles to calculate the ELP, whereas for Binary BCH codes the expected latency (or number of algorithm "steps") should be reducible to just T. This is because the discrepancy is calculated to be zero at even steps of the algorithm. Still though the RiBM algorithm is not directly reducible to just T steps.
The algorithm in the 2nd paper mentioned above ("Inversionless Decoding of Binary BCH Codes") does finish in just T steps, however it has a much longer critical path than the RiBM algorithm.
I understand that the RiBM is targeted only for Reed-Solomon codes. My question is, is there a modified version of this algorithm, or any other algorithm presented elsewhere, that can be used to calculate the ELP of a Binary BCH code in T cycles, with a minimal critical path in each cycle?
Thank you very much in advance.