Java
Java is a programming language developed by Oracle, which provides Java runtime and software development kit for many computer platforms.
Linux CentOS comes with Java which we recommend to try at first. CentOS 6 comes with
Java 1.7, CentOS 7 with Java 1.8. Additionally, a possibly more recent Java development
kit (JDK) from Oracle is installed as a jdk
module. To see what versions are available, issue module spider jdk.
While users can install and use Java GUI based development tools such as NetBeans or Eclipse, to compile and run Java programs on our clusters, for simplicity, we recommend developing Java codes locally on user's laptop or desktop, and using the command line tools on CHPC systems. The command line tools are necessary for batch job submission.
The basic steps of deploying a Java program consist of compilation of the .java source code into a .class bytecode, followed by a launch in the Java virtual machine, as described at this Oracle page. Alternatively, a whole Java code may be supplied in a Java archive (jar) file.
Java source code compilation
To compile Java source, use the javac command. There are a number of options this command takes, several of which are especially
useful. In particular -cp
option lets defines a search path for additional Java libraries that the code may
use (often compressed in Java archive - jar file), and -d
option lets us specify the directory where to put the generated class files. For
example:
javac -cp /some_path_to_Java_library/lib/otherlib.jar -d bin src/MyJavaCode.java
will build a bin/MyJavaCode.class
file from a located in src directory, and also include otherlib.jar.
Java code execution
Once the Java code is compiled into the class file(s), it is executed with the java command. Similarly to javac
, the-cp
flag defines the search path for class files and libraries. Sometimes libraries from
other languages, such as C/C++ or Fortran, are called by Java programs, search path
for those is defined by the -Djava.library.path
flag. For example:
java -cp /some_path_to_Java_library/lib/otherlib.jar:bin -Djava.library.path=/path_to_dynamic_libraries/x86-64_linux MyJavaCode
Running Java in a batch script
Running Java in a batch script can be as simple as calling the java command above. However, note that Java internally supports only thread based parallelization, therefore it is not efficient to run on more than one node (unless one uses distributed-parallel approaches such as JPPF).
As all our cluster nodes contain multiple CPU cores, it is important to assess the
parallelization of the Java code to run. There is a number of ways Java can control
and limit thread creation. If you have not written the Java code, the easiest would
be to run the code for a short period of time on the cluster interactive node and
monitor its thread parallelism level with the top
command. The %CPU column shows show much CPU is used by the java process - for multi-threaded
programs we want to see that number to be larger than 100% (100% = one fully used
CPU core), and approaching the core count on the node*100 (e.g. on a 16 core node,
getting up to 1600).
If the Java code uses thread pools, we can also try to limit the number of threads
using the -Dthread.pool.size
runtime option, and assess the parallel scaling by changing the pool size.
Once we have verified that the Java code runs reasonably in parallel, we can launch it in a SLURM script, e.g.
#!/bin/tcsh
#SBATCH --time=1:00:00 # walltime, abbreviated by -t
#SBATCH --nodes=1 # number of cluster nodes, abbreviated by -N
#SBATCH -o slurm-%j.out-%N # name of the stdout, using the job number (%j) and the first node (%N)
#SBATCH -e slurm-%j.err-%N # name of the stderr, using job and first node values
#SBATCH --ntasks=1 # number of SLURM tasks, abbreviated by -n
# additional information for allocated clusters
#SBATCH --account=owner-guest
#SBATCH --partition=ember-guest
cd directory_with_Java_code
ml jdk/1.8.0_112
java -cp /some_path_to_Java_library/lib/otherlib.jar:bin -Djava.library.path=/path_to_dynamic_libraries/x86-64_linux MyJavaCode
Note that in this example we are requesting one task on one node, but, without node sharing, so, the whole node will be allocated to this job. Java program then internally utilizes all the node CPU cores.