TY - JOUR AU - Zechmeister,, M AB - ABSTRACT In previous work, we developed the idea to solve Kepler’s equation with a CORDIC-like algorithm, which does not require any division, but still requires multiplications in each iteration. Here we overcome this major shortcoming and solve Kepler’s equation using only bitshifts, additions and one initial multiplication. We prescale the initial vector with the eccentricity and the scale correction factor. The rotation direction is decided without correction for the changing scale. We find that double CORDIC iterations are self-correcting and compensate for possible wrong rotations in subsequent iterations. The algorithm needs 75 per cent more iterations and delivers the eccentric anomaly and its sine and cosine terms times the eccentricity. The algorithm can also be adopted for the hyperbolic case. The new shift-and-add algorithm brings Kepler’s equation close to hardware and allows it to be solved with cheap and simple hardware components. methods: numerical, celestial mechanics 1 INTRODUCTION Kepler’s equation (KE) is fundamental in many fields of astrophysics. It relates mean anomaly M and eccentric anomaly E via the equation $$\begin{eqnarray*} E-e\sin E & =M(E) , \end{eqnarray*}$$(1) where |$M(t)=\tau ({t}/{P})$| with time t and orbital period P. In practice, we often need to solve the inverse problem E(M). For instance, in orbit fitting, observing times t are given and then the location or velocity of an object must be predicted, which then means it is necessary to compute E. Many methods have been proposed to solve KE, such as Newton iterations, Halley’s method, table interpolation and inverse series (Colwell 1993). In Zechmeister (2018, hereafter Ze18), we proposed to use a CORDIC-like algorithm. The coordinate rotation digital computer (CORDIC) was invented by Volder (1959) and can compute many elementary functions (e.g. cosine, sine, multiplication)1 and needs only additions and bitshifts. We briefly review the CORDIC concept in Section 2. In Ze18, the rotation directions are set accurately. However, already this step requires a multiplication in each iteration. To overcome this shortcoming, here we study the idea of simply ignoring the scale change in the direction decision, still hoping for a correct convergence (Section 3). We demonstrate that this approach is indeed suitable, given some appropriate adjustments. Finally, we discuss an implementation (Section 5) and we evaluate the performance of the algorithm (Section 6).2 2 CORDIC ALGORITHM FOR ELEMENTARY FUNCTIONS A complex number can be expressed in Cartesian coordinates $$\begin{eqnarray*} z = x+{\rm i}y , \end{eqnarray*}$$(2) as well as in polar coordinates $$\begin{eqnarray*} z = r \exp {\rm i}\varphi, \end{eqnarray*}$$(3) where x = r cos φ and y = r sin φ. When we represent this number z by a sequence of rotations with angles θn, these are simple additions in the exponent in the polar representation or complex multiplications in the Cartesian representation: $$\begin{eqnarray*} r \exp\left({\rm i}\sum \theta _{n}\right) = z = r\prod _{n}(\cos \theta _{n}+ {\rm i}\sin \theta _{n}). \end{eqnarray*}$$(4) Additions are very easy to perform for computers, while multiplications are usually more expensive, in particular with the simple hardware available years ago. Therefore, Volder (1959) sought to simplify the product in equation (4). He factored the term cos θn $$\begin{eqnarray*} z =r\prod _{n}\cos \theta _{n}\prod _{n}(1+{\rm i}\tan \theta _{n}) , \end{eqnarray*}$$(5) and allowed only angles of the form $$\begin{eqnarray*} \theta _{n}=\sigma _{n}\alpha _{n} \end{eqnarray*}$$(6) with $$\begin{eqnarray*} \tan \alpha _{n} =\frac{1}{2^{n-1}} \quad \text{and} \quad \sigma _{n}\in \lbrace -1,1\rbrace \end{eqnarray*}$$(7) for n ≥ 1. So, the first angle is α1 = 45° and the next rotation angles αn are almost halved in each iteration and the rotation can be clockwise or counterclockwise (positive or negative). Now, equation (5) can be written as $$\begin{eqnarray*} z =rK_{N}\prod _{n}\left(1+{\rm i}\frac{\sigma _{n}}{2^{n}}\right) , \end{eqnarray*}$$(8) where the term $$\begin{eqnarray*} K_{N} = \prod _{n}\cos \alpha _{n}=\prod _{n}\frac{1}{\sqrt{1+\tan ^{2}\alpha _{n}}} = \prod _n \frac{1}{\sqrt{1+4^{-n}}} \end{eqnarray*}$$(9) is called scale correction. The factor KN can be pre-computed, because the cos-function is symmetric and therefore independent of σn, and the absolute values of the rotation angles |θn| = αn are pre-defined (⁠|$K_{\mathrm{c}}\equiv K_{\infty }\approx 0.607\, 253$|⁠).3 Because of the angle choice in equation (7), the remaining product term can be computed efficiently. This is easier to explain when explicitly writing an adjacent rotation for the real and imaginary part of z as $$\begin{eqnarray*} x_{n+1} = x_{n}-m\frac{\sigma _{n+1}y_{n}}{2^{k_{n+1}}} \end{eqnarray*}$$(10) $$\begin{eqnarray*} y_{n+1} = y_{n}+\phantom{m}\frac{\sigma _{n+1}x_{n}}{2^{k_{n+1}}}, \end{eqnarray*}$$(11) where the coordinate parameter m is 1 for the circular case (−1 for the hyperbolic and 0 for the linear case). The multiplication with σn is just a negation in the case of σn = −1. The multiplication by an integer power of two (2−n) is also very easy for a computer. It is a simple bitshift in the binary system; similarly, in a decimal system, a division by 10 is just a left shift of the decimal point. Therefore, all multiplications are eliminated. Only one multiplication rKN in equation (8) remains. In the case of r = 1, even this multiplication can be saved (Walther 1971); the start vector is initialized with (x0, y0) = (KN, 0). With these basic equations, the CORDIC algorithm can compute the sine and cosine functions. Given an input angle φ, we can approach it in each iteration with the condition $$\begin{eqnarray*} \sigma _{n+1}=\left\lbrace \begin{array}{@{}l@{\quad }l@{}}+1 & \sum \theta _{n}\lt \varphi \\ -1 & \text{else} \end{array}.\right. \end{eqnarray*}$$(12) The Cartesian representation is propagated simultaneously with the same rotation directions σn via equations (10) and (11). So, when ∑θn → φ, then xn → cos φ and yn → sin φ. Fig. A1 illustrates this process. The convergence range can be derived when performing only positive rotations resulting in |$\sum\nolimits _{n=1}^{N}|\theta _{n}|=\sum\nolimits _{n}\operatorname{atan}2^{-(n-1)}\rightarrow 1.7433=99{_{.}^{\circ}}88$| for N → ∞. An initial rotation with 90°, which needs no scale correction, can extend the range to 189|${_{.}^{\circ}}$|⁠.88. It is also possible to calculate |$\operatorname{atan}(x,y)$|⁠. So given x and y, the angle φ of this vector is needed. In this mode, called vectoring, the component yn is driven towards zero. Walther (1971) generalized the CORDIC algorithm with a linear and hyperbolic mode allowing the computation of multiplication, division and the functions exp , |$\ln$|⁠, |$\operatorname{atanh}$| and square root. Table A1 gives an overview of the different modes and Table A2 lists the required input and the corresponding output to obtain various elementary functions. This diversity demonstrates the capability of this simple algorithm. However, it must be noted that the hyperbolic mode needs specific iterations to be carried out twice for4 $$\begin{eqnarray*} k_{n}\ \in \ 4,13,40,121,...,K,3K+1\ =\ \frac{3^{m+2}-1}{2}. \end{eqnarray*}$$(13) This compensates for the accumulating problem that subsequent rotation angles are a little smaller than half, 2αn + 1 < αn (while the circular mode here has some redundancy 2αn + 1 > αn, and the linear mode is exact 2αn + 1 = αn). With this sequence, the convergence is overall. From now on, we use the variable name n for iteration number and kn for the shift sequence. 3 CORDIC DOUBLE ITERATIONS FOR KEPLER’S EQUATION To apply CORDIC to KE, we proposed in Ze18 the modified condition for the rotation direction5 $$\begin{eqnarray*} \sigma _{n+1}=\left\lbrace \begin{array}{@{}l@{\quad }l@{}} +1 & E_{n}-e\sin E_{n}< M\\ -1 & \mathrm{else} \end{array}.\right. \end{eqnarray*}$$(14) For readability and with respect to equation (1), we renamed ∑θn as En and φ as M compared with Section 2. The decision in equation (14) is exact within the working precision. However, the term sin En is not accessible in true CORDIC, because of the scale change. Also, a simultaneous compensation would require a multiplication in every iteration. When the start vector E0 = 0 is pre-scaled with KN, i.e. $$\begin{eqnarray*} x_{0} & =K_{N}e\cos E_{0}=K_{N}e \end{eqnarray*}$$(15) $$\begin{eqnarray*} y_{0} & =K_{N}e\sin E_{0}=0, \end{eqnarray*}$$(16) then the term yn converges towards e sin En for n → N. However, at iteration n, the relation is $$\begin{equation*} y_{n} =\frac{K_{n}}{K_{N}}e\sin E_{n}. \end{equation*}$$ Therefore, yn and e sin En deviate by the factor |${K_{n}}/{K_{N}}=\prod\nolimits _{n}^{N}(1+4^{-k_{n}})^{-1/2}$| (in double precision, it is negligible for kn ≥ 27). In this work, we simply propose $$\begin{eqnarray*} \sigma _{n+1}=\left\lbrace \begin{array}{@{}l@{\quad }l@{}} +1 & E_{n}-y_{n}< M\\ -1 & \mathrm{else} \end{array}.\right. \end{eqnarray*}$$(17) This ignores totally the changing scale. Still, we might hope for a convergence. Fig. 1 shows what happens for the extreme case of e = 1. Many regions seem to converge, but obviously others do not converge. The approximation leads sometimes to rotations in the wrong directions and the subsequent rotations seem not to overcome this. Figure 1. Open in new tabDownload slide CORDIC on KE (for e = 1) with single iterations (red, shift sequence kn = 0, 1, 2, 3, 4, ...) and with double iterations (blue, kn = 1, 1, 2, 2, 3, 3, ...). Figure 1. Open in new tabDownload slide CORDIC on KE (for e = 1) with single iterations (red, shift sequence kn = 0, 1, 2, 3, 4, ...) and with double iterations (blue, kn = 1, 1, 2, 2, 3, 3, ...). There are other functions that can have similar issues, for instance the arcsine (Muller 2006). Baykov (1972) solved the problem with double iterations, meaning each iteration is executed twice. In Section A1, we discuss the arcsine function. We continue investigating the approach of Baykov (1972), because it does not require any modifications of the CORDIC algorithm besides the sequence for kn. We also note that the hyperbolic mode also needs specific iterations to be repeated. So double iteration is an established workaround. Indeed, it turns out that the double iterations also work for KE (Fig. 1). We can explain the success as follows. As already mentioned, yn is a good approximation of e sin En. A rotation in a wrong direction can occur, when the intermediate angle is already close to the target value (see n = 3 in Fig. 2). Then the true Mn = En − e sin En and the approximation En − yn may lie on different sides with respect to the target M (see Fig. 2; a positive rotation σ4 = +1 would be needed according to M3, but E3 − y3 suggests σ4 = −1). A wrong rotation moves away by at least αn and needs to be compensated by the subsequent rotations. For single rotations, all subsequent rotations ∑n + 1αn ≈ 2αn + 1 can recover αn, but the small redundancy in single rotations (in m = 1) is generally still insufficient to compensate for the initiating departure. Double rotations, however, introduce redundancy, which is the key to overcoming the convergence problem, but also the price to be paid for the approximation. Figure 2. Open in new tabDownload slide Evolution of the vector xn, yn (colour-coded arrows), when dropping the scaling term cos αn and using double rotation. The example is for e = 0.9 and M = 2.08 − 0.9 sin 2.08 ≈ 1.294 (thus E = 2.08). With appropriate prescaling by KN, the vector approaches length e (grey arc) after n = N (black arrow) and converges towards E. The discrepancy between the exact intermediate mean anomaly Mn = En − e sin En (crosses) and the approximated mean anomaly En − yn (open circles) is colour-coded with dotted arcs on the grey unit circle. Iterations n = 1 and n = 2 (or 3 and 4) have the same angle αn. Figure 2. Open in new tabDownload slide Evolution of the vector xn, yn (colour-coded arrows), when dropping the scaling term cos αn and using double rotation. The example is for e = 0.9 and M = 2.08 − 0.9 sin 2.08 ≈ 1.294 (thus E = 2.08). With appropriate prescaling by KN, the vector approaches length e (grey arc) after n = N (black arrow) and converges towards E. The discrepancy between the exact intermediate mean anomaly Mn = En − e sin En (crosses) and the approximated mean anomaly En − yn (open circles) is colour-coded with dotted arcs on the grey unit circle. Iterations n = 1 and n = 2 (or 3 and 4) have the same angle αn. 4 HYPERBOLIC MODE Analogous to Ze18, we study whether our double iteration algorithm is also applicable to the hyperbolic Kepler’s equation (HKE) $$\begin{eqnarray*} M=e\sinh H-H, \end{eqnarray*}$$(18) where e ≥ 1. Replacing the trigonometric terms by the hyperbolic analogues, equations (7) and (17) become $$\begin{eqnarray*} \tanh \alpha _{n} =\frac{1}{2^{k_{n}}} \end{eqnarray*}$$(19) $$\begin{eqnarray*} \sigma _{n+1} & =\left\lbrace \begin{array}{@{}l@{\quad }l@{}}+1 & y_{n}-H_{n} 0 and therefore it requires two subtractions (and one comparison). It can be advantageous to reformulate this as tn + yn > 0 with tn = M − En and t0 = M. This saves one subtraction (and one variable, i.e. memory access) in each iteration and is possible because M is a fixed input and En is needed only in the comparison during the iterations. This accumulation is common practice in CORDIC algorithms, where for e = 0 (so yn = 0) the condition tn > 0 remains. At the end, the eccentric anomaly can be recovered with EN = M + yN. Fig. 3 shows that the output M + yN differs slightly from EN, but both are within the nominal limits. Positive and negative residuals appear balanced with no visible bias for both cases; a property related to the two-sided design of the proposed CORDIC algorithm (see also fig. 3 of Ze18). In contrast, one-sided algorithms will be biased. Such a variant was outlined in Ze18 (equation 3). Figure 3. Open in new tabDownload slide Top panels: CORDIC output after 16 double iterations (k16 = 8; 29 fractional bits) for eccentricity e = 1 and three mean anomaly regions. Compared with the exact solution E (black), E16 is a step function (blue), while M + y16 (red) has slopes. Bottom panels: the absolute residuals are smaller than 2−7 = 0.0078125. Figure 3. Open in new tabDownload slide Top panels: CORDIC output after 16 double iterations (k16 = 8; 29 fractional bits) for eccentricity e = 1 and three mean anomaly regions. Compared with the exact solution E (black), E16 is a step function (blue), while M + y16 (red) has slopes. Bottom panels: the absolute residuals are smaller than 2−7 = 0.0078125. 6 ACCURACY AND PERFORMANCE STUDY 6.1 Accuracy of the fixed-point algorithm We forward calculated with equation (1) 1000 (M(E), E) pairs, with M sampled log-uniformly over |${}[10^{-26}, {\tau }/{2}]$|⁠. Here, E might be seen as the true value. Then we injected M into our algorithms to solve the inverse problem E(M). The top panel of Fig. 4 shows the dependence of the accuracy as a function of M and e for Code B.1. The accuracy becomes critical in the so-called corner of KE at M = 0 for e = 1. Here the function behaves likes a cubic root |$E\simeq \sqrt[3]{6M}$| and the derivative becomes infinite. When using 61 bits for the binary fraction (Section 5.1), the step size is 2−61 = 4.3 × 10−19 rad. This is the resolution for M. The value of the eccentric anomaly is |$\Delta E=E(M=\Delta M)=\sqrt[3]{6\times 4.3\times 10^{-19}}=1.4\times 10^{-6}$|⁠. This point marks about the largest error and is indicated in the figure. Figure 4. Open in new tabDownload slide Accuracy for the fixed-point (top) and floating-point (bottom) algorithm. For visibility, zero deviations (δ = 0) were lifted to δ = 2 × 10−17. Figure 4. Open in new tabDownload slide Accuracy for the fixed-point (top) and floating-point (bottom) algorithm. For visibility, zero deviations (δ = 0) were lifted to δ = 2 × 10−17. The general error relation is |$\mathrm{d}E= (1-e\cos E)^{-1}\,\mathrm{d}M$|⁠, which follows from equation (1). For e = 1, it becomes |$\mathrm{d}E\simeq ({1}/{3})\sqrt[3]{{6}/{M^{2}}}\,\mathrm{d}M$| and, as ΔM is constant, the errors decline as ∝M−2/3. The residuals match those theoretical limits and thus validate our implementation. We note that Fig. 4 is an extreme magnification of the corner. The errors decrease quite quickly with eccentricity. The increase of error for M⪆ 0.1 should be related to the simplified conversion between the floating-point and fixed-point numbers, which was used for the preparation of the look-up table and the input data. The mantissa of 64-bit floating-point numbers holds only 53 bits. Finally, we note the need for some spare bits. This is investigated here with a short bit system. We divided the range from 0 to 4 rad into 212 = 4096 mean anomalies, thus having 10 fractional bits (2−10). Then, we limited our fixed-point algorithm also to 10 fractional bits (see L12 in Code B.1) and iterated until the last bit, i.e. kN = 10 (N = 17). Still, the residuals in Fig. 5 are overall limited to 2−6 (with a slight bias towards more negative deviations) for this short bit algorithm. So, sometimes, the last four iterations yield no improvement. Therefore, some trailing bits (∼log2kN) are advisable to buffer accumulating truncation errors. Figure 5. Open in new tabDownload slide Residual map for a fixed-point algorithm with 10 fractional bits and iterated to the last bit (kN = k17 = 10). The colour-coding is log-symmetric. Figure 5. Open in new tabDownload slide Residual map for a fixed-point algorithm with 10 fractional bits and iterated to the last bit (kN = k17 = 10). The colour-coding is log-symmetric. 6.2 Accuracy of the floating-point algorithm Overall, the error behaviour of the floating-point algorithm with double iterations (Fig. 4) is very similar to the CORDIC-like algorithm from Ze18 (see Fig. 6), who already explained the functional forms arising for the different eccentricities. Compared with the fixed-point algorithm, the floating-point algorithm has better accuracy in the corner, where it profits from the higher precision to represent small numbers. However, the fixed-point version has better accuracy at about M ≳ 10−5. The algorithm used for Fig. 4 employed the shift sequence starting with kn = 0, 0, 1, 1, 2, 2, ... and without accumulation. For small mean anomalies, the rotation will go back and forth to zero in the first iterations, thus keeping the properties of the one-sided algorithms in Ze18. However, the two optimizations suggested in Sections 5.3 and 5.4 will worsen the accuracy to a level comparable to the fixed-point algorithm, because for floating-point numbers the order of additions does matter (while in fixed-point format this does not affect the precision). Both cases lead to the additions of small numbers to big numbers, and thus floating-point numbers cannot utilize the better representation of small numbers. 6.3 Performance We have shown that KE can indeed be solved with the shift-and-add algorithm CORDIC using minor modifications. Thus, the pressing question is how fast the algorithm is. We implemented the algorithms (as well as the comparison solvers) in plain C and, using Python’s timeit feature, measured the execution times to solve equation (1) 10 000 times and for 21 eccentricities between 0 and 0.999 999. The maximum shift index in the CORDIC versions was kN = 28, corresponding to a precision of 2−28 = 3.7 × 10−9. The mean anomalies M were distributed uniformly between 0° and 180°, which is case |$\mathcal {U}(M)$|⁠. In a second test, mean anomalies were generated, whose eccentric anomalies E were distributed uniformly, which is case |$\mathcal {U}(E)$|⁠. This gives some more weight to small mean anomalies and it shall probe whether there is a dependency on M, as some solvers are slower in the corner. Fig. 6 shows the results. The CORDIC double rotation algorithms performed Ndbl = 56 iterations.6 The runtime is independent of eccentricity and the mean anomaly. The fixed-point version is about 2.2 times faster than the floating-point version. Thus, there is a noticeable benefit. However, it is also only a factor of 2, as the floating-point code needs two multiplications in each iteration. This shows that multiplications have been optimized over the last years in current computer processing units (CPUs), crowding out CORDIC algorithms from those architectures. Figure 6. Open in new tabDownload slide Execution time as a function of eccentricity for various algorithms. For solid and dashed curves, M and E, respectively, are distributed uniformly. Figure 6. Open in new tabDownload slide Execution time as a function of eccentricity for various algorithms. For solid and dashed curves, M and E, respectively, are distributed uniformly. For comparison, we show the CORDIC-like version from Ze18, which is a floating-point algorithm. We used a one-sided variant, which is faster for small E. This may partly explain the performance increase for case |$\mathcal {U}(E)$|⁠. Moreover, this version used an if branch, so branch prediction may also affect the results. In any case, because the CORDIC-like version needs only N = 29 iterations, it is faster than its double iteration companion. Yet, the new fixed-point version is even 1.5 times faster. Finally, we see that the speed of the fixed-point version compares well with Newton’s method, which used the start guess E0 = M + 0.85e. The cosine and sine functions were called from the standard math.h library. As the source of their implementation can be hard to track down, self-programmed sine and cosine functions were also tested. The runtime increases with eccentricity. At e = 1, the |$\mathcal {U}(E)$| case takes slightly longer than the |$\mathcal {U}(M)$| case, which indicates slower convergence in the corner (at M ≈ 0). The performed tests can only serve as an orientation. There are many aspects, which can alter the outcome. Furthermore, the algorithm in Ze18 also yields as output the terms cos E and sin E, which will be needed for subsequent computation. Our CORDIC algorithm delivers e cos E and e sin E. If division by e appears disadvantageous in particular for small eccentricities, one can recompute the terms from E or alternatively propagate them in parallel with the algorithm. 7 FURTHER DISCUSSION 7.1 Unifying the Keplerian CORDIC modes CORDIC has three coordinate systems: linear, circular and hyperbolic. We have extended here the circular and hyperbolic modes to the elliptic and hyperbolic cases of KE. Thus, the question is whether there is an analogue extension for the linear mode. To address this, we summarize the main difference in the input and the direction decision of the various modes $$\begin{eqnarray*} &&{\text{rotating}\qquad t_{n} y_{n} > 0 \qquad t_{0} =\varphi} \end{eqnarray*}$$(29) $$\begin{eqnarray*} &&{\text{vectoring}\qquad -y_{n} > 0 \qquad t_{0} =0 } \end{eqnarray*}$$(30) $$\begin{eqnarray*} &&{\text{arcsine}\qquad -\varphi +y_{n} > 0 \qquad t_{0} =0 } \end{eqnarray*}$$(31) $$\begin{eqnarray*} &&{\text{KE}\qquad t_{n}+y_{n} > 0 \qquad t_{0} =M} \end{eqnarray*}$$(32) $$\begin{eqnarray*} \text{HKE}\qquad -t_{n}-y_{n} &>& 0 \qquad t_{0} =-M \end{eqnarray*}$$(33) which includes the arcsine mode for completeness. The Keplerian modes appear as mixture of rotating and vectoring, which we call ‘keplering’. Rotation and vectoring handle the different coordinate systems via the parameter m, which appears in equation (10), but not in equations (29)–(31). Now, to unify the Keplerian modes, we can suggest introducing m into equations (32) and (33) as $$\begin{eqnarray*} \text{GKE} \qquad m(t_{n}+y_{n}) &>& 0 \qquad t_{0} =mM. \end{eqnarray*}$$(34) For the linear mode with m = 0, this unification appears pointless, as the outcome will not depend on M at all. If we associated the linear Keplerian mode with the case of radial trajectories, then it would indeed complete the picture. KE and HKE solve for a time-dependent auxiliary angle, but in radial cases the angle is fixed and time-independent. Another possibility for unification is the use of base angles mαn (thus negative for HKE) along with the condition tn + myn > 0 and t0 = M. Then, for m = 0, all base angles would be zero. 7.2 Barker’s equation We have also considered that a linear mode extension might be associated with parabolic orbits. This special case is handled with Barker’s equation (Colwell 1993), which is given by $$\begin{eqnarray*} M=D+\frac{1}{3}D^{3}, \end{eqnarray*}$$(35) where D is an auxiliary variable, similar to the eccentric anomaly E and H, and related to true the anomaly by $$\begin{eqnarray*} \tan \frac{\nu }{2}=D. \end{eqnarray*}$$(36) Equation (35) is a cubic equation, whose explicit solution is often given as $$\begin{eqnarray*} D=B-\frac{1}{B} \end{eqnarray*}$$(37) with |$B=\sqrt[3]{W+\sqrt{W^{2}+1}}$| (Meire 1985) and |$W=({3}/{2})M$|⁠. The computation of this in a CORDIC framework requires five operations: division, hypotenuse, as well as exp, div and ln for the cubic root (3M are simply three additions). The equivalent solution with hyperbolicus functions $$\begin{eqnarray*} D=2\sinh \frac{\operatorname{asinh}({3M}/{2})}{3} \end{eqnarray*}$$(38) seems to be less known (Appendix C). It requires at most five CORDIC operations: sinh, div and, for the arcsine hyperbolicus (equation 3), mul, cathetus and |$\operatorname{atanh}$|⁠. When using the double iteration variant for the arcsine hyperbolicus, then the total costs are about four CORDIC cycles. A further reduction could be done by optimizing the division by three (Appendix A2). Therefore, Barker’s equation can be solved with a few nested CORDIC operations, but we do not see a possibility to tackle it more directly with CORDIC. 8 SUMMARY In this work, to the best of our knowledge, for the first time we have presented a shift-and-add algorithm to solve KE. The features of the algorithm are: use of the most basic computer operations (addition, bitshift, xor); small code size and short lookup table; independent of math libraries; adjustable precision; runtime independent of mean anomaly M and e. We require only two minor modifications to the normal CORDIC algorithm, which are the modified direction decisions (equation 17) and repeating each iteration once (double iterations for |$k\le ({1}/{2}) k_{N}$|⁠). The modifications constitute a new CORDIC mode, which we call the Keplerian mode or keplering. It is mixture of rotating and vectoring and it handles the eccentric and hyperbolic cases of KE. From the perspective of CORDIC, solving KE appears to be twice as expensive as the computation of a sine function or about as expensive as the arcsine function, which are both parameter-less, while KE has the parameter e. Although we could eliminate all multiplications from the iteration loop, this is hardly rewarded by current desktop computers, which nowadays have sophisticated multiplier units. Thus, CORDIC algorithms have been displaced from computer processors. While our CORDIC KE solver is very competitive with Newton’s method, its full potential should become available on architectures that favour CORDIC methods, such as old or cheap devices. Also, a wider revival of CORDIC might be possible in the future. ACKNOWLEDGEMENTS I thank Hanno Rein for refereeing the paper, Trifon Trifonov for reading the manuscript and Albert Chan for a helpful discussion about Barker’s equation (equation 38, Appendix C). This work is supported by the Deutsche Forschungsgemeinschaft under DFG RE 1664/12-1 and Research Unit FOR2544 ‘Blue Planets around Red Stars’ (project no. RE 1664/14-1). DATA AVAILABILITY No new data were generated or analysed in support of this research. Footnotes 1 We provide an online demo at https://raw.githack.com/mzechmeister/ke/master/cordic/js/cordic.html. 2 Code available at https://github.com/mzechmeister/ke/. 3 The q-Pochhammer symbol is defined as |$(a;q)_{n}=\prod _{k=0}^{n}(1-aq^{-k})$|⁠. Thus, the product series |$K_{\mathrm{c}}^{2}=\prod _{k=0}^{^{\infty }}(1+4^{-k})$| is the special case |$(-1; {1}/{4})_{\infty }\approx 2.71182$|⁠. Likewise, in hyperbolic mode, there occurs |$\prod\nolimits _{k=1}^{^{\infty }}(1-4^{-k})=({1}/{4}; {1}/{4})_{\infty }\approx 0.68854$|⁠, which is also a special case of the Euler product. 4 The sequence is related to http://oeis.org/A003462. 5 For reasons of uniformity with code implementation, we list the positive case first compared to Ze18. 6 For simplicity, we just truncate the shift sequence of the kN = 61 version, which performs double iterations until k ≤ 26, thus Ndbl = 2 × 27 + (28 − 26) = 56. Already for k > 14, scale corrections are smaller than 4−14 = 2−28. Thus, Ndbl = 2 × 15 + (28 − 14) = 44 would also be sufficient, promising a 21 per cent shorter runtime. 7 |$\tanh \ln \sqrt{\frac{a}{b}}=\frac{\exp \ln \sqrt{a/b}\ -\ \exp \ln \sqrt{a/b}}{\exp \ln \sqrt{a/b}\ +\ \exp \ln \sqrt{a/b}}=\frac{\sqrt{a/b}-\sqrt{b/a}}{\sqrt{a/b}+\sqrt{b/a}}=\frac{a-b}{a+b}$| 8 |${K_{h}^{2}}/{4}=0.364512$|⁠. Usually, |$x_{0}=a+ ({1}/{4})$| and |$y_{0}=a-({1}/{4})$| are proposed to obtain |$\sqrt{a}$|⁠, but this requires a post-multiplication with |${1}/{K_{\mathrm{h}}}$|⁠. 9 Because Python integers have arbitrary precision, calculations beyond 64 bits would be possible by increasing this number. REFERENCES Baykov V. , 1972 , PhD thesis , Leningrad State Univ. of Electrical Eng . http://baykov.de/CORDIC1972.htm Colwell P. , 1993 , Solving Kepler’s Equation over Three Centuries . Willmann-Bell , Richmond, VA Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Fukushima T. , 1997 , Celestial Mechanics and Dynamical Astronomy , 68 , 121 Crossref Search ADS IEEE , 2008 , IEEE Std 754-2008, IEEE Standard for Floating-Point Arithmetic . IEEE , New York Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Lang T. , Antelo E., 2000 , Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology , 25 , 19 10.1023/A:1008121502359 Crossref Search ADS Meire R. , 1985 , Journal of the British Astronomical Association , 95 , 113 Muller J. , 2006 , The CORDIC Algorithm . Birkhäuser Boston , Boston, MA , p. 133 (https://doi.org/10.1007/0-8176-4408-3_7) Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Payne M. H. , Hanek R. N., 1983 , SIGNUM Newsl. , 18 , 18 10.1145/1057605.1057606 Crossref Search ADS Takagi N. , Asada T., Yajima S., 1991 , IEEE Transactions on Computers , 40 , 989 10.1109/12.83660 Crossref Search ADS Volder J. E. , 1959 , IRE Transactions on Electronic Computers , EC-8 , 330 10.1109/TEC.1959.5222693 Crossref Search ADS Walther J. S. , 1971 , in Proc. Spring Joint Computer Conference (AFIPS ’71) . ACM , New York, NY , p. 379 (http://doi.acm.org/10.1145/1478786.1478840) Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Zechmeister M. , 2018 , A&A , 619 , A128 (Ze18) 10.1051/0004-6361/201833162 Crossref Search ADS APPENDIX A: NOTES ON THE CORDIC METHOD Table A1 summarizes the general output of CORDIC for the three coordinate systems m and the two operation modes. From this, we can derive elementary functions using specific inputs as given in Table A2. For instance, the exp function is obtained with x0 = Kh and y0 = Kh in hyperbolic vectoring (which is even simpler than adding the output of x0 = Kh and y0 = 0: exp x = cosh x + sinh x). Figure A1. Open in new tabDownload slide Examples of CORDIC rotations for circular (blue, m = 1), linear (green, m = 0) and hyperbolic (red, m = −1) modes. The target angle φ (dashed line) is approached through rotations with |$\alpha _{n}= \operatorname{atan}2^{-k}$|⁠. The vectors change their length with each iteration. Figure A1. Open in new tabDownload slide Examples of CORDIC rotations for circular (blue, m = 1), linear (green, m = 0) and hyperbolic (red, m = −1) modes. The target angle φ (dashed line) is approached through rotations with |$\alpha _{n}= \operatorname{atan}2^{-k}$|⁠. The vectors change their length with each iteration. Table A1. General input and output triples for various CORDIC modes (Walther 1971). Open in new tab Table A1. General input and output triples for various CORDIC modes (Walther 1971). Open in new tab Table A2. Input and output triples for some elementary CORDIC function m (–1 circular, 0 linear, 1 hyperbolic). m . x0 . y0 . t0 . ⇒ . xN . yN . tN . 1 Kc 0 φ t → 0 cos φ sin φ 0 −1 Kh 0 φ t → 0 cosh φ sinh φ 0 −1 Kh Kh φ t → 0 exp φ exp φ 0 1 1 b 0 y → 0 |$({1}/{K_{\mathrm{h}}})\sqrt{1+b^{2}}$| 0 |$\operatorname{atan}b$| −1 1 b 0 y → 0 |$({1}/{K_{\mathrm{h}}})\sqrt{1-b^{2}}$| 0 |$\operatorname{atanh}b$| −1 a + b a − b 0 y → 0 |$({2}/{K_{\mathrm{h}}})\sqrt{ab}$| 0 |$({1}/{2})\ln ({a}/{b})$| −1 a + 1 a − 1 0 y → 0 |$({2}/{K_{\mathrm{h}}})\sqrt{a}$| 0 |$({1}/{2})\ln a$| −1 |$a+({K_{\mathrm{h}}^{2}}/{4})$| |$a-({K_{\mathrm{h}}^{2}}/{4})$| 0 y → 0 |$\sqrt{a}$| 0 |$({1}/{2})\ln ({4a}/{K_{\mathrm{h}}^{2}})$| 0 a 0 φ t → 0 a φ · a 0 0 a b 0 y → 0 a 0 b/a m . x0 . y0 . t0 . ⇒ . xN . yN . tN . 1 Kc 0 φ t → 0 cos φ sin φ 0 −1 Kh 0 φ t → 0 cosh φ sinh φ 0 −1 Kh Kh φ t → 0 exp φ exp φ 0 1 1 b 0 y → 0 |$({1}/{K_{\mathrm{h}}})\sqrt{1+b^{2}}$| 0 |$\operatorname{atan}b$| −1 1 b 0 y → 0 |$({1}/{K_{\mathrm{h}}})\sqrt{1-b^{2}}$| 0 |$\operatorname{atanh}b$| −1 a + b a − b 0 y → 0 |$({2}/{K_{\mathrm{h}}})\sqrt{ab}$| 0 |$({1}/{2})\ln ({a}/{b})$| −1 a + 1 a − 1 0 y → 0 |$({2}/{K_{\mathrm{h}}})\sqrt{a}$| 0 |$({1}/{2})\ln a$| −1 |$a+({K_{\mathrm{h}}^{2}}/{4})$| |$a-({K_{\mathrm{h}}^{2}}/{4})$| 0 y → 0 |$\sqrt{a}$| 0 |$({1}/{2})\ln ({4a}/{K_{\mathrm{h}}^{2}})$| 0 a 0 φ t → 0 a φ · a 0 0 a b 0 y → 0 a 0 b/a Open in new tab Table A2. Input and output triples for some elementary CORDIC function m (–1 circular, 0 linear, 1 hyperbolic). m . x0 . y0 . t0 . ⇒ . xN . yN . tN . 1 Kc 0 φ t → 0 cos φ sin φ 0 −1 Kh 0 φ t → 0 cosh φ sinh φ 0 −1 Kh Kh φ t → 0 exp φ exp φ 0 1 1 b 0 y → 0 |$({1}/{K_{\mathrm{h}}})\sqrt{1+b^{2}}$| 0 |$\operatorname{atan}b$| −1 1 b 0 y → 0 |$({1}/{K_{\mathrm{h}}})\sqrt{1-b^{2}}$| 0 |$\operatorname{atanh}b$| −1 a + b a − b 0 y → 0 |$({2}/{K_{\mathrm{h}}})\sqrt{ab}$| 0 |$({1}/{2})\ln ({a}/{b})$| −1 a + 1 a − 1 0 y → 0 |$({2}/{K_{\mathrm{h}}})\sqrt{a}$| 0 |$({1}/{2})\ln a$| −1 |$a+({K_{\mathrm{h}}^{2}}/{4})$| |$a-({K_{\mathrm{h}}^{2}}/{4})$| 0 y → 0 |$\sqrt{a}$| 0 |$({1}/{2})\ln ({4a}/{K_{\mathrm{h}}^{2}})$| 0 a 0 φ t → 0 a φ · a 0 0 a b 0 y → 0 a 0 b/a m . x0 . y0 . t0 . ⇒ . xN . yN . tN . 1 Kc 0 φ t → 0 cos φ sin φ 0 −1 Kh 0 φ t → 0 cosh φ sinh φ 0 −1 Kh Kh φ t → 0 exp φ exp φ 0 1 1 b 0 y → 0 |$({1}/{K_{\mathrm{h}}})\sqrt{1+b^{2}}$| 0 |$\operatorname{atan}b$| −1 1 b 0 y → 0 |$({1}/{K_{\mathrm{h}}})\sqrt{1-b^{2}}$| 0 |$\operatorname{atanh}b$| −1 a + b a − b 0 y → 0 |$({2}/{K_{\mathrm{h}}})\sqrt{ab}$| 0 |$({1}/{2})\ln ({a}/{b})$| −1 a + 1 a − 1 0 y → 0 |$({2}/{K_{\mathrm{h}}})\sqrt{a}$| 0 |$({1}/{2})\ln a$| −1 |$a+({K_{\mathrm{h}}^{2}}/{4})$| |$a-({K_{\mathrm{h}}^{2}}/{4})$| 0 y → 0 |$\sqrt{a}$| 0 |$({1}/{2})\ln ({4a}/{K_{\mathrm{h}}^{2}})$| 0 a 0 φ t → 0 a φ · a 0 0 a b 0 y → 0 a 0 b/a Open in new tab The logarithm is obtained from the atanh function via the identity7 $$\begin{equation*} \ln \frac{a}{b}=2\operatorname{atanh}\frac{a-b}{a+b}. \end{equation*}$$ So, to compute ln a, we have to set b = 1, and thus x0 = a + 1 and y0 = a − 1. The final multiplication with two is again a bitshift. The same mode provides simultaneously the square root |$\sqrt{a}$|⁠. With |$x_{0}=a+({K_{\mathrm{h}}^{2}}/{4})$| and |$y_{0}=a-({K_{\mathrm{h}}^{2}}/{4}),$| the output is directly available in xn.8 The square root is independent of t, so this channel can be omitted. For completeness, we list further important functions that can be derived with additional subsequent CORDIC calls $$\begin{eqnarray*} \tan x =\frac{\sin x}{\cos x} \qquad \tanh x =\frac{\sinh x}{\cosh x} \end{eqnarray*}$$(A1) $$\begin{eqnarray*} \cot x =\frac{\cos x}{\sin x} \qquad \coth x =\frac{\cosh x}{\sinh x} \end{eqnarray*}$$(A2) $$\begin{eqnarray*} \operatorname{asin}x =\operatorname{atan}(x,\sqrt{1-x^{2}}) \qquad \operatorname{asinh}x =\operatorname{atanh}(x,\sqrt{x^{2}-1}) \end{eqnarray*}$$(A3) $$\begin{eqnarray*} \operatorname{acos}x =\operatorname{atan}(\sqrt{1-x^{2}},x) \qquad \operatorname{acosh}x =\operatorname{atanh}(x,\sqrt{x^{2}+1}). \end{eqnarray*}$$(A4) For instance, tan x requires one division as a second call, whereas sin x and cos x come simultaneously from circular rotation. The arcsine is discussed in the next section. A1 The arcsine function From equation (A3) and Table A1, it can be seen that |$\operatorname{asin}x$| can be computed with hyperbolic vectoring, which provides the cathetus |$\sqrt{1-x^{2}}$|⁠, and circular vectoring, which executes the |$\operatorname{atan}$| function using two arguments and saves the division. Note that the cathetus needs a multiplicative scale correction (but see Appendix A2), implying three CORDIC calls in total. A direct way to compute the arcsine is to change the direction decision as in equation (31) and to drive the yn component of the vector towards the input argument. However, as explained in Muller (2006), the missing scale correction can lead to a wrong decision and erroneous output, as illustrated in Fig. A2. Baykov (1972) solved the problem with double iterations, meaning each iteration is executed twice. However, just using the sequence kn = 0, 0, 1, 1, ... converges only for |x| < 0.73, because after the third rotation the vector cannot recover from an excursion into an adjacent quadrant. The sequences kn = 0, 1, 1, 2, 2, ... or kn = 1, 1, 2, 2, ... have a larger convergence range (related to a smaller total scale corrections). However, only an additional quadrant check (xn > 0) in each iteration gives full convergence including the arcsine-corner. We remark that KE is bijective and thus does not suffer from quadrant confusion. Figure A2. Open in new tabDownload slide Output of the arcsine mode. Single rotations (red) have overall convergence problems. The three double rotation sequences (green, cyan, blue) converge in a limited range. Full convergence is achieved by three chained CORDIC operations or double iterations with quadrant check or scale correction (black, barely visible in right corner). Figure A2. Open in new tabDownload slide Output of the arcsine mode. Single rotations (red) have overall convergence problems. The three double rotation sequences (green, cyan, blue) converge in a limited range. Full convergence is achieved by three chained CORDIC operations or double iterations with quadrant check or scale correction (black, barely visible in right corner). Takagi, Asada & Yajima (1991) developed an alternative method, which is also called double-CORDIC iteration and employs an auxiliary variable, which is scale corrected on-the-fly. Lang & Antelo (2000) also use an on-the-fly correction, but do not require double rotations. In summary, there are different ways to compute the arcsine. They may require modifications and the total costs correspond to about two to three normal CORDIC cycles. A2 Multiplication and division with a constant The standard CORDIC algorithm requires a scale correction. The scale correction does not matter in the output tN, where the factor cancels out in the ratio y/x in all modes. However, a pre- or post-scaling is needed, when using the output xN or yN in the modes m = 1 or −1. Code B.1: Open in new tabDownload slide Fixed-point algorithm with CORDIC double iteration for KE. Code B.1: Open in new tabDownload slide Fixed-point algorithm with CORDIC double iteration for KE. Code B.2: Open in new tabDownload slide Fixed-point algorithm with CORDIC double iteration for HKE. Code B.2: Open in new tabDownload slide Fixed-point algorithm with CORDIC double iteration for HKE. CORDIC provides multiplication and division, and their execution requires a full CORDIC cycle. However, if the multiplicator is known in advance and often needed, the efforts can be reduced. Let us consider first a division by 3, which occurs in equation (38). The binary representation of 1/3 is 0.01010101012..., thus its multiplication can be done as (2−2 + 2−4 + 2−6 + 2−8 + ...). The corresponding shift-and-add algorithm (essentially a binary multiplier) can be seen as a CORDIC simplification, which needs only half the iterations, because the iterations with odd shift values can be omitted, as the linear mode is scale-free. Moreover, it does not need direction decisions and the t channel, because the multiplicator is encoded in the shift sequence. The bit shifts can be done in parallel. Multiplications with other important constants, in particular the scale corrections|$K_{\mathrm{c}}\approx 0.100\, 110\, 110\, 111\, 010_{2}$| and |$K_{\mathrm{h}}\approx 1.001\, 101\, 010\, 001\, 111_{2}$|⁠, could be implemented in a similar way. APPENDIX B: CODE ILLUSTRATION In the following, we document a fixed-point implementation of the CORDIC double iteration. The pure Python code illustrates the functionality and low complexity. Further variants (floating-point) and other programming languages, in particular more performant, low-level C, are maintained online (Code available at https://github.com/mzechmeister/ke/). Code B.1 implements the algorithm described in Section 3. In this snippet, the largest shift is hardcoded in line L3 and thus sets the number of iterations. The precision depends furthermore on the location of the binary point (Section 5.1)9 defined in line L4. The lookup table (L6, L9) stores the tuples of the basis angles and the shift sequence (⁠|$\alpha _{n}=\operatorname{atan} ({1}/{2^{k_{n}}})$|⁠, kn). If the correction term 4k in equation (9) is numerical yet larger than |$2^{k_N}$| (L10), i.e. k < =kN/2, the required scale correction is accumulated (L11) and the last table entry is repeated (L12). Then the table and the scalefactor are converted to fixed-point (64-bit integers, L14–L15). The inputs for the algorithm are floating-point numbers. Thus, after a simple range reduction in L18 (note that there are more accurate range reduction algorithms; Payne & Hanek 1983) and pre-scaling with K, the start vector is converted to fixed-point (L19–L20). The multiplication e*K (L20) is the only true multiplication (but see Appendix A2). (The multiplication with R and integer conversion can be done with mantissa extraction and bit shift.) The CORDIC iterations start in L21. The comparison in equation (17) is done by an addition and a subsequent arithmetic right bit shift, which extracts the sign bit (L22). The variable s is either 0 or –1. The bitwise operation s^(s+x), alternatively (x^s)-s, modifies the sign of x in a branchless fashion. For s=0 (σn = 1), the variable x is not modified. For s=-1 (σn = −1), the result is an arithmetic negation (-x; see also Table 1). This can be directly implemented in hardware (adder-substractor). Line L23 accumulates the angles as described in Section 5.4. Finally, the fixed-point numbers are converted back to floating-point (L26). As an example, calling the function with i64_Ecs(2-sin(2), 1) should return the triple (2.0, -0.41614683654714246, 0.9092974268256817). The code for the HKE (Section 4, Code B.2) is very similar. It requires the hyperbolic arctangent (L10). The range extension is done in L19–L22. APPENDIX C: Barker’s equation The equivalence of equations (37) and (38) follows from $$\begin{eqnarray*} B-\frac{1}{B} & =&e^{\ln B}-e^{-\ln B}=2\sinh \ln B \nonumber\\ &=&2\sinh \ln \sqrt[3]{W+\sqrt{W^{2}+1}}=2\sinh \frac{\operatorname{asinh}W}{3}. \end{eqnarray*}$$(C1) The last step employed the identity (e.g. Fukushima 1997, equation 73) $$\begin{eqnarray*} \operatorname{asinh}x=\ln \left(x+\sqrt{x^{2}+1}\right), \end{eqnarray*}$$(C2) which itself can be verified with the substitution x = sinh t $$\begin{eqnarray*} t & =&\ln \left(\sinh t+\sqrt{\sinh ^{2}t+1}\right)=\ln \left(\sinh t+\sqrt{\cosh ^{2}t}\right) \nonumber\\ & =&\ln \left(\sinh t+\cosh t\right)=\ln e^{t}=t. \end{eqnarray*}$$(C3) © 2020 The Author(s) Published by Oxford University Press on behalf of the Royal Astronomical Society This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) TI - Solving Kepler’s equation with CORDIC double iterations JO - Monthly Notices of the Royal Astronomical Society DO - 10.1093/mnras/staa2441 DA - 2020-11-10 UR - https://www.deepdyve.com/lp/oxford-university-press/solving-kepler-s-equation-with-cordic-double-iterations-tHf3eyeWRQ SP - 109 EP - 117 VL - 500 IS - 1 DP - DeepDyve ER -