Aerobus v1.2
|
Aerobus
is a C++-20 pure header library for general algebra on polynomials, discrete rings and associated structures.
Everything in Aerobus
is expressed as types.
We say that again as it is the most fundamental characteristic of Aerobus
:
Everything is expressed as types
The library serves two main purposes :
It is designed to be 'quite easily' extensible.
Given these functions are "generated" at compile time and do not rely on inline assembly, they are actually platform independent, yielding exact same results if processors have same capabilities (such as Fused-Multiply-Add instructions).
aerobus.h
#include "aerobus.h"
Aerobus
provides a definition for low-degree (up to 997) Conway polynomials. To use them, define AEROBUS_CONWAY_IMPORTS
before including aerobus.h
.
Install Cmake
Install a recent compiler (supporting c++20), such as MSVC, G++ or Clang++
Move to the top directory then :
Terminal should write :
Alternate way :
From top directory.
Benchmarks are written for Intel CPUs having AVX512f and AVX512vl flags, they work only on Linux operating system using g++.
In addition of Cmake
and compiler, install OpenMP
. And Google's Benchmark library. Then move to top directory :
Aerobus
predefines several simple euclidean domains, such as :
aerobus::i32
: integers (32 bits)aerobus::i64
: integers (64 bits)aerobus::zpz<p>
: integers modulo p (prime number) on 32 bitsAll these types represent the Ring, meaning the algebraic structure. They have a nested type val<i>
where i
is a scalar native value (int32_t or int64_t) to represent actual values in the ring. They have the following "operations", required by the IsEuclideanDomain concept :
add_t
: a type (specialization of val), representing addition between two valuessub_t
: a type (specialization of val), representing subtraction between two valuesmul_t
: a type (specialization of val), representing multiplication between two valuesdiv_t
: a type (specialization of val), representing division between two valuesmod_t
: a type (specialization of val), representing modulus between two valuesand the following "elements" :
Aerobus
defines polynomials as a variadic template structure, with coefficient in an arbitrary discrete euclidean domain. As i32
or i64
, they are given same operations and elements, which make them a euclidean domain by themselves. Similarly, aerobus::polynomial
represents the algebraic structure, actual values are in aerobus::polynomial::val
.
In addition, values have an evaluation function :
Which can be used at compile time (constexpr evaluation) or runtime.
Aerobus
predefines some well known families of polynomials, such as Hermite or Bernstein :
They have their coefficients either in aerobus::i64
or aerobus::q64
. Complete list is (but is meant to be extended):
chebyshev_T
chebyshev_U
laguerre
hermite_prob
hermite_phys
bernstein
legendre
bernoulli
When the tag AEROBUS_CONWAY_IMPORTS
is defined at compile time (-DAEROBUS_CONWAY_IMPORTS
), aerobus
provides definition for all Conway polynomials CP(p, n)
for p
up to 997 and low values for n
(usually less than 10).
They can be used to construct finite fields of order \(p^n\) ( \(\mathbb{F}_{p^n}\)):
Aerobus
provides definition for Taylor expansion of known functions. They are all templates in two parameters, degree of expansion (size_t
) and Integers (typename
). Coefficients then live in FractionField<Integers>
.
They can be used and evaluated:
Exposed functions are:
exp
expm1
\(e^x - 1\)lnp1
\(\ln(x+1)\)geom
\(\frac{1}{1-x}\)sin
cos
tan
sh
cosh
tanh
asin
acos
acosh
asinh
atanh
Having the capacity of specifying the degree is very important, as users may use other formats than float64
or float32
which require higher or lower degree to achieve correct or acceptable precision.
It's possible to define Taylor expansion by implementing a coeff_at
structure which must meet the following requirement :
typename
) and index (size_t
);type
, some specialization of FractionField<Integers>::val
.For example, to define the serie \(1+x+x^2+x^3+\ldots\), users may write:
On x86-64 and CUDA platforms at least, using proper compiler directives, these functions yield very performant assembly, similar or better than standard library implementation in fast math. For example, this code:
Yields this assembly (clang 17, -mavx2 -O3
) where we can see a pile of Fused-Multiply-Add vector instructions, generated because we unrolled completely the Horner evaluation loop:
Given a set (type) satisfies the IsEuclideanDomain
concept, Aerobus
allows to define its field of fractions.
This new type is again a euclidean domain, especially a field, and therefore we can define polynomials over it.
For example, integers modulo p
is not a field when p
is not prime. We then can define its field of fraction and polynomials over it this way:
The same operation would stand for any set that users would have implemented in place of ZmZ
.
For example, we can easily define rational functions by taking the ring of fractions of polynomials:
Which also have an evaluation function, as polynomial do.
Given a ring R
, Aerobus
provides automatic implementation for quotient ring \(R/X\) where X is a principal ideal generated by some element, as we know this kind of ideal is two-sided as long as R
is commutative (and we assume it is).
For example, if we want R
to be \(\mathbb{Z}\) represented as aerobus::i64
, we can express arithmetic modulo 17 using:
As we could have using zpz<17>
.
This is mainly used to define finite fields of order \(p^n\) using Conway polynomials but may have other applications.
Aerobus
gives an implementation for continued fractions. It can be used this way:
As practical examples, aerobus
gives continued fractions of \(\pi\), \(e\), \(\sqrt{2}\) and \(\sqrt{3}\):
When compiled with nvcc
and the flag WITH_CUDA_FP16
, Aerobus
provides some support of 16 bits integers and floats (aka __half
).
Unfortunately, NVIDIA did not put enough constexpr in its cuda_fp16.h
header, so we had to implement our own constexpr static_cast from int16_t to __half
to make integers polynomials work with __half
. See this bug.
More, it's (at this time), not easily possible to make it work for __half2
because of another bug.
A workaround is to modify cuda_fp16.h
and add a constexpr modifier to line 5039. This works but only tested on Linux with CUDA 16.1.
Once done, nvcc generates splendid assembly, same as for double
or float
:
Please push to make these bug fixed by NVIDIA.