"The Great" World

C vs. Java in number crunching

Introduction

Java being the most popular programming language of application programming is usually thought to be a good (usually the best) choice for any task. Six or seven months ago one of my instructor Dr. Hoermann, who is a real expert in random number generation techniques, ask me to reimplement one of his earlier C projects in Java by using the merits of object oriented design. I won’t dwell into the details of the project but shortly it was an object oriented implementation of a set of universal random number generators.

Initially, I did not suspect about the performance of Java because I previously use some data mining tools, simulators implemented by Java especially by some universities. Such that Weka – one of the most popular Java data mining applications—implemented by New Zealand University . So I start to design a model. You know those things. Interfaces, classes, so and so for…

Finally, design and implementation were completed and it seems that our new generator was solely doing pretty well. I use the generators to estimate some stock options and to do some Monte Carlo simulations. Everything was fine but one thing. What was the speed of previously implemented project in C by Hoermann? As I check the result, it was disgusting enough to make me irritate Java. I don’t tell you to irritate Java but just see below to not to see any problem as a nail.

Since the generator I implemented requires too much theory of statistics and probability theory, I won’t deal with it. But rather I will be implementing a pretty simple PI estimator using Monte Carlo technique in both Java and C.

Some rough idea on Monte Carlo

O! many a shaft at random sent

Finds mark at the archer little meant!

And many a word, at random spoken.

May soothe or wound a heart that’s broken!

Sir Walter Scott, the Lord of the Isles

A Monte Carlo method is an algorithm that solves a problem through the use of statistical sampling. Just as I said I won’t bother doing some probabilistic proof (it’s the job of http://www.systemsimulation.blogspot.com/). But to make you believe, think that in a raining day you and your son are walking through the street with your own umbrellas. And since he is just a kid, his umbrella is smaller than yours in terms of it is radius (think that we are looking from the top). So how can you compare the number of droplets hitting to your umbrella and his umbrella.

Believe or not this will be in proportion with the square of your umbrellas’ radius. Are the umbrellas magic? What if some guy squirt water on you by using garden trunk on you while your umbrellas are in your hand? This time the ratio of droplets will not be in proportion with r square. Why? Umbrellas, water, kid, parent are all the same. So what is the problem? The key word in here is uniformity. Think some you will realize what the problem isJ

So this much mix-up in your mind about Monte Carlo is enough. Let me finish this section by showing you how well a Monte Carlo estimator performs. Blue dots are our estimates for different numbers of replications(go on reading for the meaning of replication) in the chart and the dashed line is the exact value of PI (It is a bad saying that PI’s exact value. Why?).

Performance

Monte Carlo simulation we use for estimating PI requires many uniform number tuples (Imagine that we will control the rain. But before start we calculate all the droplets’ hit coordinates(X,Y) on the earth such that they will spread uniformly on the ground). Moreover method requires some multiplications and comparisons simply. Finally it requires building up a confidence interval in order to see how satisfactory its result is. It’s the statistical way of saying “I am sure that with 95% probability real PI value is some where between A and B”. The length of the confidence interval is determined fundamentally by the number of replications (This is the successive numbers of days we use our power to rain exactly 100000 droplets on the earth). The more replications we did the less distant A and B are (The more day we rain, the more we learn how to rain). Let me show the shrinking of CI (confidence interval), as the number of replications increase:

If you look close enough, you will see that as we multiply the number of replications by 4, the CI shrinks only by half.

All those in mind, let me show you the performance of Java and C with respect to number of replications (rainy days):

From now on I will not talk any more but one thing. Notice that the response time graphics of Java and C are both linear. This is good that our algorithm is scaleable. But isn’t it nice that C’s response time slope is one eighth of the Java. J You may ask or better may search for the reasons of this result: Intermediate machine code, virtual machine, overhead of class hierarchy, garbage collector…

That’s all for today... See you in the future!

Source Codes

Java

import java.util.Random;

import java.math.*;

public class MonteCarlo {

static final int iter = 100000;

static final int max_rep = 5000;

/*I aware of the fact that such public variable declarations are bad for OO design.But for the sake of performance…*/

public static int in_region[];

public static Random r;

public static void EstimatePI(int rep) {

double x, y;

int i, j;

double mean = 0, var = 0, hl;

for (j = 0; j < rep; j++) {

in_region[j]=0;

for (i = 0; i < iter; i++) {

x = 2 * r.nextDouble() - 1;

y = 2 * r.nextDouble() - 1;

if (x * x + y * y <= 1) {

in_region[j]++;

}

mean += in_region[j] * 1.0 / rep;

}

if (rep > 1) {

for (i = 0; i < rep; i++) {

var += (in_region[i] - mean) * (in_region[i] - mean) / (rep - 1);

}

mean = 4 * mean / iter;

hl = 7.84 * Math.sqrt(var/rep) / iter;

System.out.println("PI estimation is " + mean + " with half-length " + hl);

}

public static void main(String[] args) {

MonteCarlo mc = new MonteCarlo();

mc.r=new Random();

mc.in_region = new int[mc.max_rep];

long start;

for (int r = 1; r <= mc.max_rep; r *= 4) {

start = System.currentTimeMillis();

MonteCarlo.EstimatePI(r);

System.out.println("Response time is " + (System.currentTimeMillis() - start)/1000.0 + " sec.");

System.out.println();

}

C

#include <stdio.h>

#include <stdlib.h>

#include <time.h>

#include <math.h>

#define ITER 100000

#define MAX_REP 5000

int in_region[MAX_REP];

void EstimatePI(int rep){

int i,j;

double mean=0,var=0,hl,x,y;

srand(time(NULL));

for (j = 0; j < rep; j++) {

in_region[j]=0;

for (i = 0; i < ITER; i++) {

x = 2.0 * rand()/RAND_MAX - 1;

y = 2.0 * rand()/RAND_MAX - 1;

if (x * x + y * y <= 1) {

in_region[j]++;

}

mean += in_region[j] * 1.0 / rep;

}

if (rep > 1) {

for (i = 0; i < rep; i++) {

var += (in_region[i] - mean) * (in_region[i] - mean) / (rep - 1);

}

mean = 4 * mean / ITER;

hl = 7.84 * sqrt(var/rep) / ITER;

printf("PI estimation is %7.5lf with half-length %7.5lf\n",mean,hl);

}

int main(int argc, char* argv[]){

int r;

unsigned int start;

srand(time(NULL));

for (r = 1; r <= MAX_REP; r *= 4) {

start = time(NULL);

EstimatePI(r);

printf("Response time is %d sec.\n\n",time(NULL) - start);

}

return 0;

}

The World of Great gains power with its new commenters...

Just as I said. This world is a completely dangerous site for non-technics. I am proud of introducing site commentors:

Code Name: Yalan
Academic Habitat: Cornell University
Expertship: Operations Research, Simulation Systems, and Hamzation.

Code Name: HACI
Academic Habitat: Bogazici University
Expertship: Simulation Systems(especially on ARENA), OOP, any programming language and datamining.

Most Dangerous One!!!
Code Name: Capacman
Academic Habitat: Bogazici University
Expertship: Anything tightly/loosely related to computers. But especially,C/C++/Java/Python Languages, Linux/Unix Operating Systems, Network Programming and Compilers

Look very carefully to all those guys. Those are the ones that are really experts of what they do.You will be feeling this fact. Just before leaving a poem for OpenMP lovers:

Not what we give, but what we share
For the gift without the giver is bare;
Who gives himself with his alms feeds three
Himself,his hungering negihbor,and me.

James Russell Lowell, The Vision of Sir Launfal

Legend is coming...

Target day is approaching. The Great will open up his BLOG on the 2nd Monday of June. Wait for one of the most technical Blog you have ever seen. Moreover, if you are not an engineer it's better for you to keep your self away from this Blog.We will be doing some:

Oracle
Pl/SQL
Java
C
Mach. Learning/Datamining/CRM
Simulation
Statistics
Mathmatics
Programming Languages Concept
And more...

Hammer will be back soon...

Academic, Commercial and Institutional character HOG will be back soon. He is in death-lock currently. You know multi processing is a though issue...

"The Great" World

About Me

Sunday, June 11, 2006