zs3.me

zEratosthenes Prime Number Generator Benchmark

Revision 1
© 2026 by Zack Smith. All rights reserved.

Results
Download

TL;DR

This is my benchmark that generates prime numbers without using unsigned integer division instructions and it instead uses the Eratosthenes algorithm, which is based on a bit array.

It maintains a bit array in RAM in which each bit indicates the presence or absence of a prime number.

Eratosthenes works by setting all multiples of every known prime throughout the bit array, then finding the first 0 bit after the last discovered prime number, which always will be a prime.

For every new prime that is discovered, I add it to the output buffer and then set all bits in the array that represent its multiples. Then repeat... until I reach the end of the array.

In my code the bit array represents only odd numbers, for efficiency's sake, because even numbers are never prime except the value 2. Thus:

Bit 3 in the first byte of the bit array represents the number 7, which is prime.
Bit 7 in the first byte of the bit array represents the number 15, which is not prime but is a multiple of the primes 3 and 5.

Because the entire bit array has to be updated for every new prime number that is discovered, this algorithm is susceptible to slowdown due to paging when the bit array's size is large enough to use all of the available RAM.

To ensure the bit array is located and stays entirely in RAM, there are a few measures that one can take: ,,

by first ensuring the bit array is as small as possible, and second by turning off swapping.

Findings

Even implementing this in C, it is much faster than my unsigned division algorithm.
Eratosthenes involves some overhead: When seeking X million primes, the bit array should be about 1.4X megabytes in size, where a megabyte is 1,048,576 bytes.
The run is always slower at the beginning because it has to set more bits for lower primes than
As the run proceeds, the lower parts of the bit array can be ignored because
My assembly language optimizations are only very slightly more effective than GCC's optimized code.

Results

I generated all of these results on 64-bit systems.

The program can also run on 32-bit systems; I developed that mode in a container using podman.

100 million primes

I used a bit array of size 122 MB.

CPU frequency	CPU name and RAM type	Implementation	Duration
4.464 GHz	Apple MBA-13 M5, MacOS, 24GB LPDDR5	C 0.9	0.1 minutes
3.1 GHz	Intel Core i5-4278U, Linux, 8GB DDR3L-1600	C 0.9	0.4 minutes
2.2 GHz	Snapdragon 750G, Android Samsung A52 5G, 6GB	C 0.9	0.4 minutes
0.8 GHz	Intel Core i5-4278U, Linux, 8GB DDR3L-1600	C+asm 0.10	0.9 minutes
1.5 GHz	Pine64 Star64 RISC-V StarFive JH-7110, Linux, 8GB LPDDR4-1867	C 0.9	4.4 minutes

The resulting primes.dat file can be found here.

1 billion primes

I used a bit array of size 1.328 GB.

CPU frequency	CPU name and RAM type	Implementation	Duration
4.464 GHz	Apple MBA-13 M5, MacOS, 24GB LPDDR5	C 0.9	1.1 minutes
3.1 GHz	Intel Core i5-4278U, Linux, 8GB DDR3L-1600	C 0.9	3.6 minutes
2.2 GHz	Snapdragon 750G, Android Samsung A52 5G, 6GB	C 0.9	7.1 minutes
0.8 GHz	Intel Core i5-4278U, Linux, 8GB DDR3L-1600	C+asm 0.10	8.8 minutes
1.5 GHz	Pine64 Star64 RISC-V StarFive JH-7110, Linux, 8GB LPDDR4-1867	C 0.9	51.1 minutes

The resulting primes.dat file can be found here.

2 billion primes

I used a bit array of size 2.74 GB.

CPU frequency	CPU name and RAM type	Implementation	Duration
4.464 GHz	Apple MBA-13 M5, MacOS, 24GB LPDDR5	C 0.9	2.6 minutes
3.1 GHz	Intel Core i5-4278U, Linux, 8GB DDR3L-1600	C 0.7	8.6 minutes
2.2 GHz	Snapdragon 750G, Android Samsung A52 5G, 6GB	C 0.9	18.8 minutes
1.5 GHz	Pine64 Star64 RISC-V StarFive JH-7110, Linux, 8GB LPDDR4-1867	C 0.9	113.5 minutes

The resulting primes.dat file can be found here.

3 billion primes

I used a bit array of size 4.19 GB.

CPU frequency	CPU name and RAM type	Implementation	Duration
4.464 GHz	Apple MBA-13 M5, MacOS, 24GB LPDDR5	C+asm 0.10	3.9 minutes
4.464 GHz	Apple MBA-13 M5, MacOS, 24GB LPDDR5	C 0.9	4.0 minutes
3.1 GHz	Intel Core i5-4278U, Linux, 8GB DDR3L-1600	C 0.7	12.4 minutes
1.5 GHz	Pine64 Star64 RISC-V StarFive JH-7110, Linux, 8GB LPDDR4-1867	C 0.9	178.9 minutes

The resulting primes.dat file can be found here.

4 billion primes

I used a bit array of size 5.65 GB.

CPU frequency	CPU name and RAM type	Implementation	Duration
4.464 GHz	Apple MBA-13 M5, MacOS, 24GB LPDDR5	C 0.9	5.5 minutes
3.1 GHz	Intel Core i5-4278U, Linux, 8GB DDR3L-1600	C	18.8 minutes
1.5 GHz	Pine64 Star64 RISC-V StarFive JH-7110, Linux, 8GB LPDDR4-1867	C 0.9	244.4 minutes

The resulting primes.dat file can be found here.

5 billion primes

I used a bit array of size 7.13 GB.

CPU frequency	CPU name and RAM type	Implementation	Duration
4.464 GHz	Apple MBA-13 M5, MacOS, 24GB LPDDR5	C+asm 0.10	6.8 minutes
4.464 GHz	Apple MBA-13 M5, MacOS, 24GB LPDDR5	C 0.9	6.9 minutes
3.1 GHz	Intel Core i5-4278U, Linux, 8GB DDR3L-1600	C (2 arrays)	24.0 minutes
1.5 GHz	Pine64 Star64 RISC-V StarFive JH-7110, Linux, 8GB LPDDR4-1867	C 0.9	309.5 minutes

* The i5-4278U required the RAM pressure workaround, described below.

The resulting primes.dat file can be found here.

6 billion primes

I used a bit array of size 8.619 GB.

CPU frequency	CPU name and RAM type	Implementation	Duration
4.464 GHz	Apple MBA-13 M5, MacOS, 24GB LPDDR5	C 0.9	8.4 minutes

The resulting primes.dat file can be found here.

8 billion primes

I used a bit array of size 11.64 GB.

CPU frequency	CPU name and RAM type	Implementation	Duration
4.464 GHz	Apple MBA-13 M5, MacOS, 24GB LPDDR5	C+asm 0.10	11.5 minutes

The resulting primes.dat file can be found here.

10 billion primes

I used a bit array of size 14.675 GB.

CPU frequency	CPU name and RAM type	Implementation	Duration
4.464 GHz	Apple MBA-13 M5, MacOS, 24GB LPDDR5	C+asm 0.10	14.6 minutes
4.464 GHz	Apple MBA-13 M5, MacOS, 24GB LPDDR5	C 0.9	14.8 minutes

The resulting primes.dat file can be found here.

12 billion primes

I used a bit array of size 17.741671 GB.

CPU frequency	CPU name and RAM type	Implementation	Duration
4.464 GHz	Apple MBA-13 M5, MacOS, 24GB LPDDR5	C 0.9	19.1 minutes

The resulting primes.dat file can be found here.

Download

The source code:

zEratosthenes 0.12

RAM pressure workaround

When testing a Macbook Pro with Intel Core i5-4278U and 8GB of RAM, which was running Debian Linux, I had to use two bit arrays totalling 7.13 GB, with includes a lower bit array of 1024 MB that was deallocated as early as possible during the run, because I found that Linux was eventually killing the program due to excessive RAM pressure, even in multi-user (non-graphical) mode.

The root cause was that Debian still uses enough of the available 8GB RAM by itself to prevent running zEratosthenes for more than a few minutes without it being killed. This may be attributable to systemd, because if I compare this to the Star64 RISC-V single-board computer, which in contrast uses a non-systemd distro, I find that the Star64 has no trouble providing enough of that board's 8GB RAM to run to zEratosthenes to completion without the two-array hack.

The RAM pressure mitigations that I used on the i5-4278U machine included:

Boot Linux into multi-user (non-GUI) mode so that there is no GUI, which frees up RAM.
- On Linux, sudo systemctl set-default multi-user
Tell systemd to suspend unnecessary services like CUPS, Docker and pipewire.
- On Linux, e.g. sudo systemctl stop docker
Reduce the output array size to only a few megabytes. This negatively impacts performance but is necessary.
Use the two-array option to relieve memory pressure during the run.

To ensure that no paging is performed, which significantly slows down the program, it is helpful to turn off swap partitions/files.

On Linux, sudo swapoff /dev/sdX

Useful links

1639936831