Why?

As for the chosen unfortunate few who somehow enjoy scientific computing, there are the even more select few that actually can achieve good performance and good maintainability. It is one thing to make highly specialized mathematical code easy to write, but it is a completely different story to make it easy to read later.

In that pursuit of nice to maintain domain specific code; in absolutely many respects-it is another league of hurt to make it performant. Julia is the only language I have seen tackle this problem in a rather (previously) uncharted way. Some of those of course do lead to some limitations and tradeoffs that are product of what the language attempts to achieve. However, I believe the unique position Julia has itself allows it to be in its own league in some situations.

Is it too good to be true? A language like that has to be both simple to write but also deep enough to meet the demands of highly parallelized, domain specific applications. I will say, in my experience using it for 2-3 years extensively in many different contexts: it does have some rough edges, but it may truly has what you may need in spades. It is certainly what I needed; and it may be what you or other people you know need as well.

Who Made This (An Important Question)

A lot about a language to begin with can be inferred by who made it, and for why. Standard, spoken languages are usually propagated by some measure of colonialism. Programming, not so much (unless you are Oracle). The intentions behind a given language give insight into what the language will be built for. Generally in software, when you do not think about a given case, it will be a miracle if it works well if you try adding it later.

The main developers Jeff Bezanson, Stefan Karpinski, Viral B. Shah, and Alan Edelman [1] are all computer scientists- many of them specifically have done work in the area of high performance computing, as is one of the first things they introduce to begin with in their oddly emotionally gripping detail of how they came to the same conclusion I came to a decade later.

“We are power Matlab users. Some of us are Lisp hackers. Some are Pythonistas, others Rubyists, still others Perl hackers. There are those of us who used Mathematica before we could grow facial hair. There are those who still can’t grow facial hair. We’ve generated more R plots than any sane person should. C is our desert island programming language.

We love all of these languages; they are wonderful and powerful. For the work we do — scientific computing, machine learning, data mining, large-scale linear algebra, distributed and parallel computing — each one is perfect for some aspects of the work and terrible for others. Each one is a trade-off.” - Why We Created Julia

Why I Think it Needed to be Made

The predicate fact here is this not as a result of a lack of trying. Notice that most of these languages are basically shackled to C libraries and C/C++/Fortran/Rust vectorization for any modicum of performance, or we have the culprit right here in our midst. C. C was made to be just a much nicer version of Assembly.

R, Python, and MATLAB for scientific applications are good if they have a use case that is well supported. And that itself is a true gamble depending on the flavor of dynamic, interpreted language you desire. If they are not, the code that you will often see is out of a good Steven King novel. Truly horrifying. The core issue, is what if it doesn’t exist? Sometimes, that’s your job. If you are doing research in a novel space, that is your sole job.

Oh, oh no.

The problem is, the more bleeding edge you go into research, the more you will find the diminishing support of your niche. Whether they are deprecated, not updated to a sane language version, have bad versioning and package management practices, this extends far beyond hobbyists and often unfortunately; to the labs and teams that are supposed to be experts in their field.

You will have to fufill your niche with very little in the way of the crutches that keep that language afloat. You may have broadcast operations, BLAS, and some other math on your side. But for the problems that are not adequately described within a relegated walled garden, it will be very difficult to try to fit everything in.

Program in C?

As for the C inclusion and mention, its native support for any number of things, whether memory safety, math concepts other than your most fundamental ones are not there. Are there good libraries for computation in applied areas made in C? Sure. But if you want to make something more novel, which requires a lot of experimentation, in a language that gives you virtually nothing, how does that work? Not so well.

Well, in the usual order of business, you prototype in one language, and then you write the final implementation in the other. There are reasons for this of course, but the real question comes down to why? Why do I have to try to scotch tape sub-optimal solutions all the time to simply make something barely functional? Why do I have to use package managers that suck at versioning just to try to find some obscure library that does not work, and start at square one enumerably ad nauseum.

In either case, CMake is only beating out Make and NPM. To make a joke about diseases would be in poor taste, but the taste that CMake leaves in my mouth is disgusting; even with a package manager/dependency manager (e.g. VCPKG, Chocolatey, Conan).

Program in Rust?

I get you, I really do. I actually do like Rust a lot. Rust does have a lot of good things going for it; mainly in the strength of its type system. Unlike C/C++/Fortran, the package management and build system (Cargo) is for the most part-extremely straight forward. In my experience though, there does appear to be not a lot of support from the scientific community in many areas of research for the language. If one is approaching more on the greenfield side of development though, this should not matter very much at all.

In some cases, using some sort of binding solution (e.g. PyBind11, Nanobind) or FFI can be the right solution; especially for a small-medium sized tool. I do however think these types of binding schemes or FFI that needs to be wrapped can introduce their own complexities in development that should be weighed. If one effectively creates software with the knowledge that they have to wrap it into Python for example due to the native usage being too difficult or inaccessible for their audience, then that has to be factored into the calculation of what language one uses.

Or?

Or I have to dance around the language. Using Numpy because your loops are too slow and you have to dance around using the language you are currently in like Antoñita Singla is stupid. MATLAB is still less stupid about vectorization, but it is still way harder to read than just a simple for loop in anything other than trivial cases. The fact that you often have to resort to contorted, unreliable pleas of vectorization to not get abominable performance is ridiculous.

Imagine some grad student-the average chemist who barely knows how to use a computer trying to program. Do you think they know what vectorization is or does? No, they want to know why their code performs like complete trash because they used a single loop. If you think an “easy” language requires coping for the slow built-in logic of the language with a patchwork of libraries written in other languages, please re-evaluate that recommendation at the end of this post.

You would sound like you need a chloroform excorcism to the average chemist trying to process their data. And honestly, they have a point. Why do you need to contort like an Eldritch beast just to do anything of scale? Why do I have to break out the CPU instruction knowledge to actually know what is happening with anything as simple as basic addition between 2 arrays?

I believe it is honestly unreasonable to act as if this is the best we can do for our non-CS colleagues. Why does an “easy language” have to come at the cost of the ease of writing performant code in a straight forward, ergonomic way? Julia genuinely is a language that is built to push people into the right direction for good performance and abstraction practices; even if they aren’t aware of it, and I think it really does fit this bill.

If you read the Wikipedia article on Julia starting out (like I also did), you will hear the description of a dynamic, high-level, general purpose, multi-dispatch, multi-paradigm language.

What does that truly mean, however? How does it achieve this? I think for many of us, it is best to actually prove how and why.

Typing

Structs

With primitive types (like Int64, Float64, etc…) in Julia; these are all stored the exact same way you would expect in a statically typed, compiled language like C++.

Structs have two types: immutable and mutable.

Immutable structs are the default in Julia and are value types that are stored on either in registers or the stack. You cannot change the field of an immutable struct, but you of course can replace the structure with a new one. The compiler at the very least, is able to for some cases, optimize out these memory swaps.

struct TRS{T <: AbstractFloat}
    position::SVector{3, T}
    rotation::SVector{4, T}
    scale::SVector{3, T}
end

This is though- fortunately negotiable in situations where an immutable variable is stored in heap (e.g. in a Vector):

@inbounds @inline function SetPosition!(transforms::Vector{TRS{T}}, value::SVector{3, T}, index::Int64)::Nothing where T <: AbstractFloat
    basePtr::Ptr{TRS{T}} = Base.unsafe_convert(Ptr{TRS{T}}, transforms) + ((index - 1) * sizeof(TRS{T}))
    transPtr::Ptr{T}     = convert(Ptr{T}, basePtr)
    posPtr::Ptr{T}       = convert(Ptr{T}, Base.unsafe_convert(Ptr{SVector{3, T}}, Ref(value)))
    unsafe_copyto!(transPtr, posPtr, 3)
    return nothing
end

Mutable structs require a keyword and are usually stored in the heap:

mutable struct Settings
    versionNum::Int64
    fileName::String
end

As of Julia 1.8, one can also specify constant invariants to mutable structs, which can only be initialized in constructors:

mutable struct Settings
    versionNum::Int64
    const fileName::String

    Settings(versionNum::Int64, fileName::String) = new(versionNum, fileName)
end

Dynamic Typing

Dynamic typing is a term that actually has a defined meaning but not one that actually pre-empts any implementation details. Effectively, dynamic typing means that one is able to by default, change the type of a variable to a different one without an error (unlike in a static language).

The reason why I mention the lack of implementation spec is because the typical, minimal structure of a boxed type (where everything is assumed to be dynamic) would look like:

struct Boxed {
    uint32_t typeHash; // Maps to the type.
    uint32_t typeSize; // How big the type actually is (in bytes).
    void*    data;     // Where the data is actually stored.
};

This is of course introduces substantial slowdowns especially in comparison to arithmetic operations; which otherwise have hardware definitions built-in for many, many common operations.

Let alone, when you have arrays of them; where even if you have a uniform array that has all the same type, you are paying for the space and memory indirection each time. Quite inefficient.

value = 0.0
value = "$value"
println(value)
# 0.0

This is legal Julia code and will return the string “0.0”. However, using:

value = 0.0
value = "$value"
println(typeof(value))
# String

This outputs the type of String!? Even if you manage to have a dynamic type reassignment in your code, there are times where it will be able to optimize around dynamic type issues performance wise.

As a stripped down example that displays with minimal extra instructions the ability for Julia to actually “fix” these types of mistakes:

function Run(value)
    output = unsafe_trunc(Int64, value)
    return output
end

The unsafe_trunc function does not have any fallbacks or exception support and otherwise maps to the appropriate float => integer instruction for your given hardware. In Julia, the Assembly can be obtained via the built-in macro @code_native:

@code_native debuginfo=:none syntax=:att binary=false Run(5.0)

This emitted the following Assembly:

pushq                 %rbp
.cfi_def_cfa_offset   16
.cfi_offset %rbp,     -16
movq                  %rsp, %rbp
.cfi_def_cfa_register %rbp
vcvttsd2si            %xmm0, %rax
popq                  %rbp
retq

In x86, there is no equivalent scalar instruction for a float type to an integer. Instead, this uses the vector instruction vcvttsd2si.

Now let’s try the example where the float output value is being overridden:

function Run(value)
    output = value
    output = unsafe_trunc(Int64, output)
    return output
end
pushq                 %rbp
.cfi_def_cfa_offset   16
.cfi_offset %rbp,     -16
movq                  %rsp, %rbp
.cfi_def_cfa_register %rbp
vcvttsd2si            %xmm0, %rax
popq                  %rbp
retq

Your eyes do not deceive you… This is the same exact Assembly!

Is the language actually just… overwritting the type itself?

Julia offers the ability to not just show the Assembly output, but also the lowered AST (abstract syntax tree) as well.

@code_lowered Run(5.0)
CodeInfo(
1  %1 = value
        output = %1
   %3 = Main.unsafe_trunc
   %4 = output
        output = (%3)(Main.Int64, %4)
   %6 = output
└──      return %6
)

The “%” sign appears to use LLVM-style syntax for local variables, which is part of the Julia-specific SSA IR.

Um… What is this? Well, first, it defines the scope of the input, good so far. But at least here, the dynamic reassignment still occurs?

This is because this is before type inference occurs. Julia performs type inference as a converging optimizing heuristic (data-flow sensitive {order of execution sensitive}), because typing can be reassigned. It cannot be treated with the same kind of flow-insensitive algorithm to guarantee correctness like with static languages.

The @code_typed macro however does allow for showing the process of type-inference before and after further optimization passes.

@code_typed optimize=false Run(5.0)
CodeInfo(
1  %1 = value::Float64
        (output = %1)::Float64
   %3 = output::Float64
        (output = Main.unsafe_trunc(Main.Int64, %3))::Int64
   %5 = output::Int64
└──      return %5
) => Int64

and

@code_typed optimize=true Run(5.0)
CodeInfo(
1  %1 = Base.fptosi(Int64, value)::Int64
└──      return %1
) => Int64

The unoptimized part looks very similar to the lowered AST (makes a lot of sense), but one thing is different; that now that it has resolved the types; Julia has actually factored in the explicit type change within the lowered syntax. However, we actually can look at how it does this using the function:

Base.code_ircode(Run, [Float64], optimize_until=x)

With x = 1,

1-element Vector{Any}: 
2 1  %1 = _2::#UNDEF                                                       
          (_3 = %1)::#UNDEF                                                
3    %3 = _3::#UNDEF                                                       
          (_3 = Main.unsafe_trunc(Main.Int64, %3))::#UNDEF                 
4    %5 = _3::#UNDEF                                                       
  └──      return %5::#UNDEF                                                
   => Int64

The IR is compacted at the first stage. Types have not been determined yet, but variables are interpreted as SSA (static single assignment form), where each variable here is only being assigned only once. Combined with stripping variable names, this allows for the type assignment to no longer even matter:

At x = 2,

1-element Vector{Any}: 
2 1  %1 = _2::Float64                                                      
     %2 = %1::Float64                                                      
3    %3 = %2::Float64                                                      
     %4 = Main.unsafe_trunc(Main.Int64, %3)::Int64                         
4    %5 = %4::Int64                                                        
  └──      return %5                                                        
   => Int64

This now allows per SSA line, the type deduction without acknowledging the type instability at all.

This does not mean that type instability strictly speaking cannot hinder performance, and that concrete typing isn’t imperative for well-maintainable, guaranteed performant code.

Which is the reason for the @code_warntype macro:

@code_warntype Run(5.0)
MethodInstance for Run(::Float64)
  from Run(value) @ Main e:\Julia\Article Test\Main.jl:1
Arguments
  #self#::Core.Const(Main.Run)
  value::Float64
Locals
  output::Union{Float64, Int64}
Body::Int64
1  %1 = value::Float64
        (output = %1)
   %3 = output::Float64
        (output = Main.unsafe_trunc(Main.Int64, %3))
   %5 = output::Int64
└──      return %5

Julia was and had to be built as a dynamic language, its behavior is to get as close as it possibly can to static performance without sacrificing the ergonomic flexibility that dynamic typing can offer.

Abstraction

Julia; in spite of the roots of being (until fairly recently as of Julia 1.10) using FemtoLisp (a flavor of Scheme) in the language parsing process, operates with the most similarity to 2 discontinued languages: Dylan and Fortress.

There is hardly OOP to speak of: except with constructors; which also support polymorphism.