C vs. Java in number crunching
Introduction
Java being the most popular programming language of application programming is usually thought to be a good (usually the best) choice for any task. Six or seven months ago one of my instructor Dr. Hoermann, who is a real expert in random number generation techniques, ask me to reimplement one of his earlier C projects in Java by using the merits of object oriented design. I won’t dwell into the details of the project but shortly it was an object oriented implementation of a set of universal random number generators.
Initially, I did not suspect about the performance of Java because I previously use some data mining tools, simulators implemented by Java especially by some universities. Such that Weka – one of the most popular Java data mining applications—implemented by New Zealand University . So I start to design a model. You know those things. Interfaces, classes, so and so for…
Finally, design and implementation were completed and it seems that our new generator was solely doing pretty well. I use the generators to estimate some stock options and to do some
Since the generator I implemented requires too much theory of statistics and probability theory, I won’t deal with it. But rather I will be implementing a pretty simple PI estimator using
Some rough idea on Monte Carlo
O! many a shaft at random sent
Finds mark at the archer little meant!
And many a word, at random spoken.
May soothe or wound a heart that’s broken!
Sir Walter Scott, the Lord of the Isles
A
Believe or not this will be in proportion with the square of your umbrellas’ radius. Are the umbrellas magic? What if some guy squirt water on you by using garden trunk on you while your umbrellas are in your hand? This time the ratio of droplets will not be in proportion with r square. Why? Umbrellas, water, kid, parent are all the same. So what is the problem? The key word in here is uniformity. Think some you will realize what the problem isJ
So this much mix-up in your mind about
Performance
If you look close enough, you will see that as we multiply the number of replications by 4, the CI shrinks only by half.
All those in mind, let me show you the performance of Java and C with respect to number of replications (rainy days):
From now on I will not talk any more but one thing. Notice that the response time graphics of Java and C are both linear. This is good that our algorithm is scaleable. But isn’t it nice that C’s response time slope is one eighth of the Java. J You may ask or better may search for the reasons of this result: Intermediate machine code, virtual machine, overhead of class hierarchy, garbage collector…
That’s all for today... See you in the future!
Source Codes
Java
import java.util.Random;
import java.math.*;
public class MonteCarlo {
static final int iter = 100000;
static final int max_rep = 5000;
/*I aware of the fact that such public variable declarations are bad for OO design.But for the sake of performance…*/
public static int in_region[];
public static Random r;
public static void EstimatePI(int rep) {
double x, y;
int i, j;
double mean = 0, var = 0, hl;
for (j = 0; j < rep; j++) {
in_region[j]=0;
for (i = 0; i < iter; i++) {
x = 2 * r.nextDouble() - 1;
y = 2 * r.nextDouble() - 1;
if (x * x + y * y <= 1) {
in_region[j]++;
}
}
mean += in_region[j] * 1.0 / rep;
}
if (rep > 1) {
for (i = 0; i < rep; i++) {
var += (in_region[i] - mean) * (in_region[i] - mean) / (rep - 1);
}
}
mean = 4 * mean / iter;
hl = 7.84 * Math.sqrt(var/rep) / iter;
System.out.println("PI estimation is " + mean + " with half-length " + hl);
}
public static void main(String[] args) {
MonteCarlo mc = new MonteCarlo();
mc.r=new Random();
mc.in_region = new int[mc.max_rep];
long start;
for (int r = 1; r <= mc.max_rep; r *= 4) {
start = System.currentTimeMillis();
MonteCarlo.EstimatePI(r);
System.out.println("Response time is " + (System.currentTimeMillis() - start)/1000.0 + " sec.");
System.out.println();
}
}
}
C
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
#define ITER 100000
#define MAX_REP 5000
int in_region[MAX_REP];
void EstimatePI(int rep){
int i,j;
double mean=0,var=0,hl,x,y;
srand(time(NULL));
for (j = 0; j < rep; j++) {
in_region[j]=0;
for (i = 0; i < ITER; i++) {
x = 2.0 * rand()/RAND_MAX - 1;
y = 2.0 * rand()/RAND_MAX - 1;
if (x * x + y * y <= 1) {
in_region[j]++;
}
}
mean += in_region[j] * 1.0 / rep;
}
if (rep > 1) {
for (i = 0; i < rep; i++) {
var += (in_region[i] - mean) * (in_region[i] - mean) / (rep - 1);
}
}
mean = 4 * mean / ITER;
hl = 7.84 * sqrt(var/rep) / ITER;
printf("PI estimation is %7.5lf with half-length %7.5lf\n",mean,hl);
}
int main(int argc, char* argv[]){
int r;
unsigned int start;
srand(time(NULL));
for (r = 1; r <= MAX_REP; r *= 4) {
start = time(NULL);
EstimatePI(r);
printf("Response time is %d sec.\n\n",time(NULL) - start);
}
return 0;
}