Software engineer with a focus on game development and scalable backend development
I have been building a couple of prototypes recently and came accross the need to use native code in one of them, specifically on the backend. After debating which tech I should try out for this, I decided to give it a test and figure out if the performance differences could sway my vote between my usual go-to languages, Scala and GoLang.
To begin with I need some native code to call, this doesn’t need to do anything complex but it does need to have the basics of all the operations I am going to be doing from the caller side, mainly calling a function with parameters, and returning a value.
I decided to just go as bare bones as possible, creating 4 simple methods to represent adding, subtracting, division and multiplication in C.
float sum(float a, float b)
{
return a + b;
}
float min(float a, float b)
{
return a - b;
}
float mult(float a, float b)
{
return a * b;
}
float div(float a, float b)
{
return a / b;
}
This is compiled to a shared library, to be loaded at runtime, using GCC.
gcc -c -Wall -Werror -fpic cexample.c
gcc -shared -o libcexample.so cexample.o
The tests themselves will simply consist of calling these methods a number of times and timing the average times using a multiple of different techs.
In order to give the JVM based tests a chance to pre-warm, I am going to make a number of calls to the native code before the timed tests, this can easily simulate having had a number of requests before the requests we are timing. And to keep things consistent I am also going to do the same in the Go implementation.
The Go implementation makes use of the built-in cgo feature to call native libraries. Because Go is compiled itself to native code, the library will be linked as it would in C++ or C, however that also means Go has to know about the methods you are going to access before hand in the form of a special set of comments.
package main
/*
#cgo LDFLAGS: -L${SRCDIR}/ -lcexample
float sum(float, float);
float min(float, float);
float mult(float, float);
float div(float, float);
*/
import "C"
import "fmt"
import "time"
import "os"
func main() {
var a C.float = 3.0
var b C.float = 4.0
iters := [7]int{1, 10, 100, 1000, 10000, 100000, 1000000}
//Warm up, this shouldn't matter for Go but want to keep consistent with all implementations
for i := 0; i < 1000; i++ {
C.sum(a, b)
C.min(a, b)
C.mult(a, b)
C.div(a, b)
}
for _,iter := range iters {
start := time.Now().UnixNano()
for i := 0; i < iter; i++ {
var c C.float = C.sum(a, b)
var d C.float = C.min(c, 2.0)
var e C.float = C.mult(d, 2.0)
_ = C.div(e, 4.0)
}
end := time.Now().UnixNano()
totalTime := end - start
average := totalTime / int64(iter)
f, _ := os.OpenFile("../results.csv", os.O_APPEND|os.O_WRONLY, 0600)
defer f.Close()
fmt.Fprintf(f, "GO, %d, %d, %d\n", iter, totalTime, average);
}
}
CGo works by mapping the functions in the library to identical functions within its own C namespace, this namespace also contains its own set of basic types to be used when interacting with native code that takes in parameters or returns a value.
I won’t run through all of the code, as it should be pretty self explanatory, It simply runs the native calls a number of times and times these calls. It runs the code a different number of times to get a good range from which to take an average.
The one bit of code that I do want to mention is the following comment, it’s essentially a compile time flag issued to the CGo linker to let it know where to find the library and which library to link to.
#cgo LDFLAGS: -L${SRCDIR}/ -lcexample
Java Native Interface, JNI for short, is the built in JVM construct for interfacing with native code allowing you to call native code or be called from the native code. JNA is a framework built on top of JNI in order to reduce the amount of boilerplate and setup code required to use JNI.
With JNA you create an interface, or in Scala’s case a trait, that extends the Library class and defines all the methods your native code provides. You then tell JNA to create you an implementation of your interface using your native code.
import com.sun.jna._
import java.io._
trait LibCexample extends Library {
def sum(a:Float, b:Float):Float
def min(a:Float, b:Float):Float
def mult(a:Float, b:Float):Float
def div(a:Float, b:Float):Float
}
object Main {
def main(args: Array[String]): Unit = {
System.setProperty("jna.library.path", new java.io.File(".").getCanonicalPath);
val LibCexample = Native.loadLibrary("cexample", classOf[LibCexample]).asInstanceOf[LibCexample]
val a:Float = 3.0f
val b:Float = 4.0f
//Warm up by making calls pre-timer
for(i <- 1 to 1000) {
LibCexample.sum(a, b)
LibCexample.min(a, b)
LibCexample.mult(a, b)
LibCexample.div(a, b)
}
val iterations = List(1, 10, 100, 1000, 10000, 100000, 1000000)
for (iter <- iterations) {
val t0 = System.nanoTime()
for(i <- 1 to iter) {
val c = LibCexample.sum(a, b)
val d = LibCexample.min(c, 2.0f)
val e = LibCexample.mult(d, 2.0f)
val f = LibCexample.div(e, 4.0f)
}
val t1 = System.nanoTime()
val totalTime = t1 - t0
val avgTime = totalTime / iter
val write = new PrintWriter(new FileOutputStream(new File("../results.csv"),true))
write.write(s"JNA, $iter, $totalTime, $avgTime"+ scala.util.Properties.lineSeparator)
write.close()
}
}
}
Unlike with Go, Scala doesn’t compile the native code into the executable thus you will need to tell it where to find the library. This can be done a number of ways however I have opted to set the jna.library.path property that tells JNA where to search for libraries.
System.setProperty("jna.library.path", new java.io.File(".").getCanonicalPath);
Like with Go I am running the test a number of times to get a good average. The code also pre-calls the native code a number of times to allow the JVM to warm up just encase there is any setup costs within the JVM.
Bridj is an alternative Java library for interacting with C/C++ code, it doesn’t make use of JNI and focuses on ease of use and performance, however this library isn’t as mature or stable as JNI.
Firstly you need to create a class containing the methods available in your native code, this class will need to have a specific set of annotations and have all its native calls defined as native static.
package main;
import org.bridj.BridJ;
import org.bridj.CRuntime;
import org.bridj.Pointer;
import org.bridj.ann.Library;
import org.bridj.ann.Ptr;
import org.bridj.ann.Runtime;
@Library("cexample")
@Runtime(CRuntime.class)
public class LibCexample {
static {
BridJ.register();
}
public native static float sum(float a, float b);
public native static float min(float a, float b);
public native static float mult(float a, float b);
public native static float div(float a, float b);
}
This class is what gives access to the native code, allowing its native static methods to interface to the C/C++ code. The Scala code can then interface with this class in order to perform the operations.
import main.LibCexample
import java.io._
object Main {
def main(args: Array[String]): Unit = {
val a:Float = 3.0f
val b:Float = 4.0f
//Warm up by making calls pre-timer
for(i <- 1 to 1000) {
LibCexample.sum(a, b)
LibCexample.min(a, b)
LibCexample.mult(a, b)
LibCexample.div(a, b)
}
val iterations = List(1, 10, 100, 1000, 10000, 100000, 1000000)
for (iter <- iterations) {
val t0 = System.nanoTime()
for(i <- 1 to iter) {
val c = LibCexample.sum(a, b)
val d = LibCexample.min(c, 2.0f)
val e = LibCexample.mult(d, 2.0f)
val f = LibCexample.div(e, 4.0f)
}
val t1 = System.nanoTime()
val totalTime = t1 - t0
val avgTime = totalTime / iter
val write = new PrintWriter(new FileOutputStream(new File("../results.csv"),true))
write.write(s"BRIDJ, $iter, $totalTime, $avgTime" + scala.util.Properties.lineSeparator)
write.close()
}
}
}
As with previous tests I am running the calls a number of times to get the average and as with the previous JVM based code I am also pre-warming the JVM by making a number of calls before timing them.
The results shown below show the time it takes to run the given number of iterations, these tests have been ran 30 times and the averages of these noted below.
Iterations | Total - Go | Average - Go | Total - JNA | Average - JNA | Total - Bridj | Average - Bridj |
---|---|---|---|---|---|---|
1 | 867 | 867 | 358425 | 358425 | 284028 | 284028 |
10 | 7820 | 781 | 108069 | 10806 | 23810 | 2380 |
100 | 79030 | 789 | 763540 | 7634 | 59183 | 591 |
1000 | 797636 | 797 | 11786822 | 11786 | 383280 | 382 |
10000 | 7342885 | 733 | 75253867 | 7524 | 1814188 | 180 |
100000 | 75006528 | 749 | 497305685 | 4972 | 13345597 | 133 |
1000000 | 775811573 | 775 | 4737624876 | 4737 | 87071036 | 86 |
For the most part the results are as I expected, Go gets compiled to native code and linked so it seems normal that the average time per set of calls is maintained regardless of the amount calls made.
Oddly enough, both JVM based methods have a high one-time cost, I don’t really know what it could be caused by as the methods are called before had to try and negate any setup costs, but regardless the high amount of time is consistent across all test runs.
Also, interestingly enough, the JVM versions seem to reduce the average time per call as more calls are made. I don’t understand quite what is going on here but my best guess is that the JVM performs some form of caching when calling native calls.
Finally the surprise here is the performance on Bindj, whilst it’s performance for a low amount of calls is low, in a high frequency call environment it is much faster. Ultimately I would recommend running a similar set of tests if you are going to be doing something similar, and I am still not convinced that the JVM isn’t doing something to mess with the results on a high call scenario, but it is good to see that Go is operating as expected and keeps consistent.