Faster way to find primes?

ags1 · March 14, 2016, 11:44am

I need to generate a big table of primes (well, I could just download them I suppose - but do I trust some random Internet person to get it right?).

I wrote the following naive code:

EDIT: Riven suggestion highlighted.


public class FindPrimes {

    public static void main(String[] args) {
++        List<Long> primes = new ArrayList<>(1000_000);
++        primes.add(2L);
++        long candidate = 3;
        while (primes.size() < 1000_000) {
            boolean isPrime = true;
            for (long prime : primes) {
                if (candidate % prime == 0) {
                    isPrime = false;
                    break;
                }
            }
            if (isPrime) {
                primes.add(candidate);
                System.out.println(primes.size() + "\t: " + candidate);
            }
++            candidate += 2;
        }

    }
}

I’ve looked up https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes, but I don’t see how that is going to be faster…

EDIT: I should of course set capacity of the ArrayList to 1000_000…

Riven · March 14, 2016, 11:51am


public class FindPrimes {

    public static void main(String[] args) {
        List<Long> primes = new ArrayList<>();
        long candidate = 2;
        while (primes.size() < 1000_000) {
            boolean isPrime = true;
            for (long prime : primes) {
                if (candidate % prime == 0) {
                    isPrime = false;
                    break;
                }
            }
            if (isPrime) {
                primes.add(candidate);
                System.out.println(primes.size() + "\t: " + candidate);
            }
-           candidate++;
+           candidate+=2;
        }

    }
}

You’re welcome, twice as fast

ags1 · March 14, 2016, 11:58am

It will be much faster, because the loop starts at 2, and with your change will then test 4, 6, 8, etc for primeness???

But even with that change I don’t think it will go significantly faster. The prime test loop exits on item 0 in this case, so that’s not causing the slowdown.

ags1 · March 14, 2016, 12:21pm

This goes faster:


    public static void main(String[] args) {
        long[] primes = new long[1000_000];
        primes[0] = 2;
        int primeCount = 1;
        long candidate = 3;
        while (primeCount < 1000_000) {
            boolean isPrime = true;
            for (long prime : primes) {
                if (prime == 0) { break; }
                if (candidate % prime == 0) {
                    isPrime = false;
                    break;
                }
            }
            if (isPrime) {
                primes[primeCount++] = candidate;
                System.out.println(primeCount + "\t: " + candidate);
            }
            candidate += 2;
        }

    }

KaiHH · March 14, 2016, 3:09pm

The sieve still is many many orders of magnitude faster.
One factor being: It does not need the modulus operator. The other factor being, it is O(n log log n) instead of n².
And it only requires 122 KB of memory instead of 7.6 MB for your 1,000,000 numbers.
(the following is literally converted from the Wikipedia article to Java)


import java.util.BitSet;
public class Sieve {
   public static void sieve() {
      int n = 1000000;
      BitSet candidates = new BitSet(n);
      candidates.set(2, n - 1, true);
      int sqrtN = (int) Math.sqrt(n);
      // Sieve out the numbers
      for (int i = 2; i <= sqrtN; i++)
         if (candidates.get(i))
            for (int j = i * i; j <= n; j += i)
               candidates.set(j, false);
      // Print out all primes (or put them in a list, or send them to a callback, etc...)
      for (int i = 2; i < n; i++)
         if (candidates.get(i))
            System.out.println(i); // <- this is a prime
   }
}

Just counting the number of primes between 0 and 100,000,000 (onehundred million) with this algorithm, by not sysout but instead incrementing a counter, takes just 886 Milliseconds and outputs 5,761,455.

Riven · March 14, 2016, 3:59pm

@KaiHH: a noticable difference is that ags1’s code returns the first 1_000_000 primes, and your code returns all primes in the range [0, 1_000_000]

You also mention that you don’t do a modulo, but that is effectively done by flagging all multiples of a number (above a threshold) [icode]false[/icode]. This is a linear operation, while a modulo is a constant operation. Trying to find an undefined number of primes, and this inner loop will be the bottleneck (as it keeps on flagging endless amounts of ‘slots’ to false). I know this is a theoretic complaint, but still

BTW: You can equally use a BitSet in ags1’s algorithm, so save memory, at the cost of performance.

KaiHH · March 14, 2016, 4:09pm

No, it does not, because it would need to have pi^-1(x) large of a long array, which for 1,000,000 is something between 10,000,000 and 100,000,000.
EDIT: Sorry, I was wrong! Did not really see that the while loop went for as long as it took to really find N primes, and did not got over the size of the array. Sorry.

Riven · March 14, 2016, 4:11pm

Can you explain what you mean? I’m just observing that you both have a ‘collection’ of size 1_000_000 in the end, but ags1’s collection has 1_000_000 primes, while your collection holds 1_000_000 bits describing whether each slot holds a prime.

KaiHH · March 14, 2016, 4:13pm

Edited my post.

KaiHH · March 14, 2016, 4:30pm

Here is a version which only computes/outputs the first ‘count’ primes:


public static void firstCountPrimes(int count) {
   // Upper bound of the pi^-1(x) function
   // See: https://en.wikipedia.org/wiki/Prime_number_theorem#Approximations_for_the_nth_prime_number
   int n = (int) (count * (Math.log(count) + Math.log(Math.log(count)) + 1));
   BitSet notPrime = new BitSet(n);
   int sqrtN = (int) Math.sqrt(n);
   for (int i = 2; i <= sqrtN; i++)
      if (!notPrime.get(i))
         for (int j = i * i; j <= n; j += i)
            notPrime.set(j, true); // <- true = this is NOT prime
   int primeCount = 1;
   for (int i = 2; primeCount <= count; i++)
      if (!notPrime.get(i)) {
         System.out.println(primeCount + " = " + i);
         primeCount++;
      }
}

HeroesGraveDev · March 14, 2016, 6:55pm

You can skip over all multiples of 2/3 by incrementing in steps of 6 (starting at 0) and checking the number above and the number below.

KaiHH · March 14, 2016, 7:20pm

Also look at Melissa O’Neill’s “The Genuine Sieve of Eratosthenes.”

ags1 · March 14, 2016, 7:30pm

I got home and benchmarked my primitive efforts versus KaiHH BitSet boss code:

http://pastebin.java-gaming.org/f83ef6614431d

I only did it to 100,000 primes because my code is rather slow.

Run 1:

Ags1 v1 took 87706ms and found highest prime: 1299709
Ags1 v2 took 12452ms and found highest prime: 1299709
KaiHH method took 27ms and found highest prime: 1299709

Run 2:

Ags1 v1 took 85349ms and found highest prime: 1299709
Ags1 v2 took 12133ms and found highest prime: 1299709
KaiHH method took 29ms and found highest prime: 1299709

Run 3:

Ags1 v1 took 84787ms and found highest prime: 1299709
Ags1 v2 took 12146ms and found highest prime: 1299709
KaiHH method took 26ms and found highest prime: 1299709

KaiHH · March 14, 2016, 7:37pm

I want to say, this is not my code (well code yes, but not algorithm). I just saw your post about finding the first N primes and then read Wikipedia.
When it comes to such mathematical algorithms I never try to come up with an own algorithm. There usually were far more clever people back in the B.C. ages who found the most efficient “algorithms” by carving them into stone.

EDIT: Also, I updated my last code. It’s a bit faster, by inverting the meaning of the bits and therefore not having to initialize all bits to true at first. Might want to benchmark it again.

ags1 · March 14, 2016, 7:47pm

I also updated my v2 code to use an int[] and not a long[] - primes are more common than I thought.

I needed to think about your code but now I get it

It is also interesting to me that cleaning up my quick and dirty v1 code (using a primitive array and getting rid of any and all autoboxing) gave a 7x speedup.

gouessej · March 15, 2016, 1:17am

I often have to write my own algorithms even though some more clever people are able to write more efficient ones than mine but because they are unable or they don’t want to explain them precisely to the masses unlike me. For example, I had to write my own algorithm to solve this kind of problem:

I have found no robust implementation in Java that supports jagged arrays. When I don’t understand a source code, I prefer not using it as if something breaks, I won’t be able to repair it.

Thank you for the code

delt0r · March 15, 2016, 4:30am

I had some fast code for this and even larger primes using some fancy math crap. But that was back when I was a MSc student and computers were around 1000x slower than today. Used Ecliptic curves and stuff. I don’t remember any of that stuff now.

First the first million primes, so pretty small stuff really. The Sieve is by far the best trade off in time to code and time to run.

Roquen · March 15, 2016, 9:27am

I have to ask: why primes? I can’t think of a single time I’ve used a prime number. Squarefree numbers and/or relatively prime numbers…that’s a different story.

That aside, note that finding X primes and finding the first X primes are different problems.

EDIT: should have provided an example - https://en.wikipedia.org/wiki/Sieve_of_Atkin#Explanation

princec · March 15, 2016, 9:41am

[quote=“Roquen,post:18,topic:56625”]
Increasingly it looks like… because they’re there to be found! This seems to be quite an entertaining problem.

Cas

KaiHH · March 15, 2016, 10:04am

No one is disagreeing with you there, or even saying that both problems were the same.
If you are trying to say, that you should use different algorithms to solve those two different problems, then your reference to the Sieve of Atkin does not quite support that, because it also is used to search for the primes in [0…N] and not for enumerating the first N primes, as is the Sieve of Eratosthenes. Or maybe I misunderstood that Atkin algorithm. But thanks for the algorithm reference!
Btw.: Do you know a good algorithm to actually find the first N primes faster than abusing the algorithms to find primes within [0…pi^-1(N)]?