edit results from the CPAN

edit
Alien-libsecp256k1
view release on metacpan or search on metacpan
libsecp256k1/doc/safegcd_implementation.md view on Meta::CPAN
i = N # divsteps left to do
while True:
    # Get rid of all bottom zeros at once. In the first iteration, g may be odd and the following
    # lines have no effect (until "if eta < 0").
    zeros = min(i, count_trailing_zeros(g))
    eta -= zeros
    g >>= zeros
    i -= zeros
    if i == 0:
        break
    # We know g is odd now
    if eta < 0:
        eta, f, g = -eta, g, -f
    g += f
    # g is even now, and the eta decrement and g shift will happen in the next loop.
```

We can now remove multiple bottom *0* bits from *g* at once, but still need a full iteration whenever
there is a bottom *1* bit. In what follows, we will get rid of multiple *1* bits simultaneously as
well.

Observe that as long as *&eta; &geq; 0*, the loop does not modify *f*. Instead, it cancels out bottom
bits of *g* and shifts them out, and decreases *&eta;* and *i* accordingly - interrupting only when *&eta;*
becomes negative, or when *i* reaches *0*. Combined, this is equivalent to adding a multiple of *f* to
*g* to cancel out multiple bottom bits, and then shifting them out.

It is easy to find what that multiple is: we want a number *w* such that *g+w&thinsp;f* has a few bottom
zero bits. If that number of bits is *L*, we want *g+w&thinsp;f mod 2<sup>L</sup> = 0*, or *w = -g/f mod 2<sup>L</sup>*. Since *f*
is odd, such a *w* exists for any *L*. *L* cannot be more than *i* steps (as we'd finish the loop before
doing more) or more than *&eta;+1* steps (as we'd run `eta, f, g = -eta, g, -f` at that point), but
apart from that, we're only limited by the complexity of computing *w*.

This code demonstrates how to cancel up to 4 bits per step:

```python
NEGINV16 = [15, 5, 3, 9, 7, 13, 11, 1] # NEGINV16[n//2] = (-n)^-1 mod 16, for odd n
i = N
while True:
    zeros = min(i, count_trailing_zeros(g))
    eta -= zeros
    g >>= zeros
    i -= zeros
    if i == 0:
        break
    # We know g is odd now
    if eta < 0:
        eta, f, g = -eta, g, -f
    # Compute limit on number of bits to cancel
    limit = min(min(eta + 1, i), 4)
    # Compute w = -g/f mod 2**limit, using the table value for -1/f mod 2**4. Note that f is
    # always odd, so its inverse modulo a power of two always exists.
    w = (g * NEGINV16[(f & 15) // 2]) % (2**limit)
    # As w = -g/f mod (2**limit), g+w*f mod 2**limit = 0 mod 2**limit.
    g += w * f
    assert g % (2**limit) == 0
    # The next iteration will now shift out at least limit bottom zero bits from g.
```

By using a bigger table more bits can be cancelled at once. The table can also be implemented
as a formula. Several formulas are known for computing modular inverses modulo powers of two;
some can be found in Hacker's Delight second edition by Henry S. Warren, Jr. pages 245-247.
Here we need the negated modular inverse, which is a simple transformation of those:

- Instead of a 3-bit table:
  - *-f* or *f ^ 6*
- Instead of a 4-bit table:
  - *1 - f(f + 1)*
  - *-(f + (((f + 1) & 4) << 1))*
- For larger tables the following technique can be used: if *w=-1/f mod 2<sup>L</sup>*, then *w(w&thinsp;f+2)* is
  *-1/f mod 2<sup>2L</sup>*. This allows extending the previous formulas (or tables). In particular we
  have this 6-bit function (based on the 3-bit function above):
  - *f(f<sup>2</sup> - 2)*

This loop, again extended to also handle *u*, *v*, *q*, and *r* alongside *f* and *g*, placed in
`divsteps_n_matrix`, gives a significantly faster, but non-constant time version.


## 7. Final Python version

All together we need the following functions:

- A way to compute the transition matrix in constant time, using the `divsteps_n_matrix` function
  from section 2, but with its loop replaced by a variant of the constant-time divstep from
  section 5, extended to handle *u*, *v*, *q*, *r*:

```python
def divsteps_n_matrix(zeta, f, g):
    """Compute zeta and transition matrix t after N divsteps (multiplied by 2^N)."""
    u, v, q, r = 1, 0, 0, 1 # start with identity matrix
    for _ in range(N):
        c1 = zeta >> 63
        # Compute x, y, z as conditionally-negated versions of f, u, v.
        x, y, z = (f ^ c1) - c1, (u ^ c1) - c1, (v ^ c1) - c1
        c2 = -(g & 1)
        # Conditionally add x, y, z to g, q, r.
        g, q, r = g + (x & c2), q + (y & c2), r + (z & c2)
        c1 &= c2                     # reusing c1 here for the earlier c3 variable
        zeta = (zeta ^ c1) - 1       # inlining the unconditional zeta decrement here
        # Conditionally add g, q, r to f, u, v.
        f, u, v = f + (g & c1), u + (q & c1), v + (r & c1)
        # When shifting g down, don't shift q, r, as we construct a transition matrix multiplied
        # by 2^N. Instead, shift f's coefficients u and v up.
        g, u, v = g >> 1, u << 1, v << 1
    return zeta, (u, v, q, r)
```

- The functions to update *f* and *g*, and *d* and *e*, from section 2 and section 4, with the constant-time
  changes to `update_de` from section 5:

```python
def update_fg(f, g, t):
    """Multiply matrix t/2^N with [f, g]."""
    u, v, q, r = t
    cf, cg = u*f + v*g, q*f + r*g
    return cf >> N, cg >> N

def update_de(d, e, t, M, Mi):
    """Multiply matrix t/2^N with [d, e], modulo M."""
    u, v, q, r = t
    d_sign, e_sign = d >> 257, e >> 257
    md, me = (u & d_sign) + (v & e_sign), (q & d_sign) + (r & e_sign)
( run in 0.933 second using v1.01-cache-2.11-cpan-0bb4e1dffa6 )