The Scala vs Erlang Debate, Part 2: The Geek Off

Written on 3:47:00 PM by S. Potter

In the first part of The Scala vs Erlang Debate article series, I discussed various ways to go about adopting either Scala or Erlang and in what scenarios it made sense to go with one over the other from the manager's perspective. In summary the major advantage of Scala is that it runs on the JVM so it can access pre-existing Java libraries your enterprise has already developed. This is also Scala's major disadvantage because it makes a rather large assumption (which in my opinion would be incorrect the majority of the time) that the Java libraries being used are thread-safe. Erlang on the other hand doesn't allow you to integrate in the same OS process with Java libraries, because Erlang was built from the ground up with scalability and fault-tolerence in mind and ideally you shouldn't put Java into the run-time stack because it compromises Erlang's fault-tolerance features significantly. Instead the preferred approach when writing the biggest bottlenecks of your system in Erlang and integrating with pre-existing Java libraries/system is to use SOA to create separate, but integrated systems that are easy to swap out and upgrade at a later point. In my view, you should be refactoring your enterprise architecture by defining services anyway, since this will decouple systems and remove a great deal of headaches in IT maintenance and deployment. Today I wanted to geek out and differentiate the two environments on a more technical level for the middle/technical managers.

Scala: Technical Overview

Scala is an objected oriented, statically typed language with many functional programming capabilities built-in. However, it must be noted (because few Scala documents do this) that Scala is hybrid actor programming language and therefore a number of assumptions cannot be made. So by being a hybrid functional language it losses some of the benefits. As well as it's Java run-time support on the JVM it is also supported in Microsoft's Common Language Runtime (CLR) so .NET integration, in theory, is a breeze. Again this also assumes that the .NET libraries you use from Scala are thread-safe. On a related topic, .NET also has F# available, which is Microsoft's attempt at a hybrid functional programming language also. I have no comment on F# myself, since I have zero experience with it. Please chime in on the comments section if you have experience with F# and are able to contrast it with Scala.

Erlang: Technical Overview

Erlang on the other hand is a dynamic, purer functional programming language that supports light-weight processes in its own run-time environment (separate from OS processes), which send messages to each other asynchronously. One major side effect of being purer in its actor treatment is that there is virtually not state. There is no way to pass "state" around (a couple of minor exceptions), but there is a lambda calculus optimization that can be taken advantage of that will ease away our troubles called tail-recursion. Tail-recursion only partially works in Scala. For many developers that only have experience in procedural and Object-Oriented (OO) programming languages like Java, Python, Ruby, C++, C, Perl, C# this sounds completely insane. Two years ago I may have agreed with you, but today life is much simpler without state in my Erlang travels than when I had to write thread-safe Java code and I wish I had discovered this concept earlier. I will not pretend this notion didn't take some getting used to but today I see things a lot clearer and I am thankful for that clarity.

Scala: Concurrency

Concurrency in Scala is modeled using a concept of actors, except there are two types of actors: thread-based and event-based actors. Since Scala was implemented on top of the JVM, which is stack-based, Scala had to compromise on Erlang's one actor approach. This is because native threads, which the JVM uses for concurrency support, are much heavier-weight than Erlang's lightweight processes thus reducing the number of thread-based actors that could be launched inside one JVM. So Scala created this concept of a very peared down event-based actor, which does not have all the functionality of thread-based actors but does not incur the penalty of needing to be backed by a native thread. To be as complete as I can be in this section I should also note that Scala treats everything as an object, even though it has these two types of actors. Actors in Scala are just objects. In turn this may create duplication between objects and actors.

Erlang: Concurrency

The Erlang VM was built from the bottom up with lightweight processes (a.k.a. actors), therefore it is stackless and much better at lightweight concurrency. Erlang also has a built-in mechanism for load balancing these lightweight processes across all available CPU cores. This means Erlang has a massive advantage over JVM based functional programming languages at getting the most out of multi-core hardware, which is at the crux of the problem domain that software needs to address going forward.

Scala vs. Erlang: Performance

In terms of linear performance (sequential) Scala has been shown to beat Erlang in certain cases. However, concurrent performance consistently shows Erlang beating Scala, sometimes by wide margins on multi-core hardware. Not surprising considering what we learned in the concurrency sections above. When considering the problem we are trying to overcome - optimizing software on multi-core hardware - it seems clear which of these benchmarks are more important! :)

Conclusion

I think it is clear, which side of this debate I firmly stand, but I have tried to demonstrate both good and bad traits of each programming language from a technical perspective that might translate into a business case or reason for choosing one over the other in your organization. If I have time I might do a part 3, which will talk a little about syntax and DSL support. This might round out the argument since it could be argued that Scala wins in the third part by a hair, but all is not lost for Erlang if the underlying problem you need to solve is related to optimizing software for modern hardware and building robust systems at the same time. My vote still goes to Erlang despite the lack of built-in features that make designing DSLs easier, but it is not difficult to build your own DSL engine in Erlang either.

Limitations of this series:

There are quite a few topics I left out of this post that might be of interest for some applications or services. Namely:
  • Hot swapping: Erlang has great hot swapping capabilities for production environments that minimizes or possibly eliminates downtime of systems. This feature in Scala is not close in capability or reliability at this point.
  • Garbage collection: without getting into too many details Erlang probably wins this section by a hair (not significant to write home about). Long sweeps will not be a problem in Erlang and only a problem in JVM languages on rare occasions.
  • Distributed nodes: The clear winner here would be Erlang because built-in to the environment are all the multi-node features that make distributed systems a breeze. Scala on the other hand has different frameworks and libraries to facilitate distributing nodes of logic on different hosts.
  • [Added Nov-2009] Frameworks: As far as I know, Scala doesn't have an OTP killer, not even a contender to compete with Erlang's Open Telephony Platform (OTP). Sure it will take some time, even Haskell, which has been around much longer than Scala is slightly jealous of OTP (I saw the way Don Stewart looked at me when I asked him if Haskell had framework similar to OTP;) ). Supervision trees along make it stand out as the best thing since sliced bread, at least for this particular software engineer.

Resources:

Corrections (added February 2010):

  • Erlang is not a pure functional language. However, it does not have a polluted (impure) treatment of actors like Scala:) Some think this is a good-enough approach to be able to leverage the JVM, when running on the JVM is really important, perhaps 4 times out of 5 it might be "good enough". I just wonder if it really matters for most projects that you run on the JVM!?
  • "Scala's lack of tail recursion is entirely due to a trade-off. The JVM doesn't do tail recursion itself. It is possible to add tail recursion on top of the JVM using various tricks, but they are expensive. SISC does its own stack based heap and Kawa uses trampolining." -- James Iry

The Scala vs Erlang Debate, Part 1: The Managers Overview

Written on 9:37:00 AM by S. Potter

Note: The discussion is primarily between Scala and Erlang, but other languages could potentially be substituted in the arguments for and against. Scala stands for any language that attempts to solve the multi-core issue software developers face with threads and Erlang is a language that has light weight processes which solves the biggest issue of going with a multi-process solution in software. The debate started with the realization that Moore's Law died a while ago. Tomorrow (as we see it now) is destined for multi-cores. A problem of massive proportions now faces software engineers, since hardware engineers changed the rules on us. Now I am not giving hardware engineers a hard time, I am simply pointing out that there has been a massive shift in the way hardware is built, which directly affects the performance and reliability of the software which runs on it. It is not a transparent change that software engineers can ignore. With that background laid out, as software engineers, we must appreciate the problem we now have to tackle, both server-side and client-side developers alike will need to tackle these issues promptly. However, this article only considers server-side software as there is where my specialized expertise lies. This part of the article is the high-level. I'll get into the specifics in the second part. So the question now is how do we leverage the advantage given to us by new hardware progress of multi-cores? There are two fundamentally different problems to solve: software products/projects that already have large complex systems already built in languages that support threads half-decently such as in Java and JVM languages including Scala; but there are also those startups that don't have baggage and can start with a better long-term solution by using a true concurrent language like Erlang.

The Legacy Enterprise

The problem with utilizing JVM threading (whether directly in Java or via other languages built on the JVM such as Scala) is that third party code you use or existing Java/JVM code written isn't necessarily thread-safe. You might have all your business entities and models defined in Java, but can it really be threaded? The pain of migrating or porting your existing system from a JVM language to another truly concurrent language like Erlang might be much greater than dealing with threading issues in just a few key bottleneck areas that will increase throughput and CPU utilization enough. In this case I doubt many pragmatic engineers or architects will advocate starting from scratch in a language you know little (if anything) about. However, identifying those key pain points might take some time, but will be a necessary cost if you are serious about solving this problem in the near-term using a solution like Scala. However, what these engineers and architects should stress to management is that this is likely to become a monkey patch. There will be a tipping point where moving to a pure concurrent language like Erlang would be more beneficial that sticking to the monkey patch and living on the JVM forever, no matter what Sun tries to tell you. There are hybrid solutions for the medium-term, that depending on the project/enterprise specifics may be a good solution until you are able to fully port the code to a langauge like Erlang (or whatever the trend may be at the time you get to it). SOA principles can be used to design and define a meaningful architecture that will declutter the system and allow you to tackle just the bottlenecks of the system in an Erlang port/rewrite and still give you (the manager) a low risk PoC project so you can evaluate Erlang in your enterprise. Having worked on a number of large enterprise projects where upgrading the "legacy" system was the goal, I know full well that management are likely to prefer the monkey patches (i.e. Scala where the pain of rewriting components in a PoC way doesn't exceed the pain of threading your existing code at the bottlenecks), especially when the middle/technical managers do not highlight the drawbacks of the "solution" adequately. Three to nine months later the management (especially middle management) are hurting big time, but it is not usually the most appropriate time to do your "I told you so dance". Trust me!:) The the monkey patch isn't an ideal solution when you are simply trading one complexity of equal of greater value for the other and in 2-5 years the likelihood that there will be many more skilled engineers and architects that truly understand and have solid production experience in a truly concurrent language like Erlang is high in my estimation, so making a move to such a language/environment would not be as risky at that time. However, for 2-5 years you have been getting little out of your hardware and software together unless you overbuy hardware. In fact, in the case where legacy code already exists, there is never really an ideal solution. However, starting out with meaningful PoCs and structuring low risk exit points after conducting objective evaluations of these projects is key to the success of every "legacy solution". If in doubt and your legacy system isn't showing signs of hurting yet, you might want to wait and beef up on hardware. This is a transition period and if you are in a conservative IT enterprise this might be a better near-term approach.

Startups

Now for the startups or those starting afresh, there is mixed news: very good news and not-so-bad news. Green field projects, in theory, are always the best. You can get everything right from day one....well at least you can hope to do so (let us pretend for this article). My advise for your team is to take the plunge to Erlang. It is actually relatively low risk to do so for the following reasons:
  • It is a mature language and environment already tested by large and complex real world systems at telcos where reliability and scalability were and are top priorities.
  • There are a few good books out there on that will help you convert to Erlang and Functional Programming (see resources section below), although documentation and education on Erlang, generally, is somewhat lacking.
  • A framework for services has been designed, developed and put into production called OTP, which stands for the Open Telephony Platform and actually nothing specific to Telephony applications in it, it was simply developed by a Telco.
Erlang's approach to concurrency is interesting and takes some getting used to as it is a pure functional programming language unlike Scala which is a hybrid. Now the bad news is that there are very few people that know Erlang (or other pure concurrent languages), you need to hire at least one specialist with Erlang experience to guide your team to avoid the potholes and perhaps even stop you before you get to the edge of a cliff. They might not be cheap, but starting out it will mitigate project risk significantly. Another negative on the Erlang side of the argument is that you will not be able to use your Java classes straight from within Erlang (in a meaningful way - I'll cover this in the next part). The positive about this negative though is that you can encapsulate your Java code in an SOA way so that you can decouple and declutter subsystems within your enterprise, which is an important task to do regardless of which legacy solution you choose. Finally, good luck and let me know how it goes!

Resources

Programming Erlang: Software for a Concurrent World by Joe Armstrong

This book, for the moment, is the definitive guide for Erlang. If you have already decided to go the Erlang route then you will not mind (or perhaps not even notice) the "selling" of Erlang throughout the book. Overall this is an essential book to anyone learning Erlang at this point.

Real World Haskell

While Haskell isn't either Scala or Erlang, if you do not come from a functional programming frame of mind you will need a good book to introduce you gently into the realm of functional programming. This book does a pretty good job. While the syntax for Haskell is different to both Erlang and Scala, the hardest part of moving to functional programming isn't the syntax. Trust me on this one:)

Also consider me, Susan Potter, a resource if you need an Erlang software engineer with high-level enterprise architecture background in Java/J2EE/JEE and Ruby/Rails production environments. I am developing a risk management product in Erlang right now and I am available as a consultant through my consulting firm Finsignia, which mostly works with financial services firms such as investment banks and hedge funds, which is where my business domain expertise lies, but also works on designing, building and deploying reliable and robust server-side solutions for any industry.

CUBarCamp: Champaign-Urbana's Tech Meetup Update

Written on 8:13:00 PM by S. Potter

This is just an update blog post to let geeks in and around the Champaign-Urbana area know that the CU BarCamp is now scheduled for April 25th, 2009. The meetup will start at 2pm CDT and will be located at the Collective Turf Coworking space:

110 W. Main Street Urbana, IL 61801
It is right between the Heartland Gallery and Priceless Books and opposite the Urbana Business Association office and Crane Alley.

Want to participate?

If you want to volunteer, present or demo any topics at this BarCamp please update the CUBarCamp wiki page and join the Google Group for the CU BarCamp event. Hope to see you there!