Carlos Fenollosa — Blog

Thoughts on science and tips for researchers who use computers

Run QBasic in your browser

August 17, 2018 — Carlos Fenollosa

Steve Hanov produced an impressive implementation of QBasic in Javascript, with detailed explanations, that also runs on the browser. The post is eight years old!

If you're nostalgic for DOS Basic, you can't miss this link.

qb.js: An implementation of QBASIC in Javascript (via)

Tags: programming, retro

Comments? Tweet  

The Elixir of concurrency

May 23, 2016 — Carlos Fenollosa

Elixir is a fairly young language that was born when José and a few Rails developers tried to create a modern language optimized for concurrent, distributed, lightweight processes

They wanted a modern Ruby-like syntax with a well-tested process manager, the Erlang VM. The result is Elixir, defined as a dynamic, functional language designed for building scalable and maintainable applications, a correct but vague affirmation which doesn't do justice to its power and elegancy.

I recently compared the move to Elixir from Python as a similar leapfrog to moving to Python from Java. It feels like something new, modern, powerful, with killer features that you don't want to renounce to.

In Python I found a REPL, list comprehensions, a super clean syntax and decorators. Elixir brings lightweight supervised processes, pattern matching, a fully functional programming language, pipes and a terrific build tool: mix

If you've never written functional code, the jump is significant. I took a Scala course a couple years ago and I've needed almost two full weeks to write production code in Elixir. The language is young, Stack Overflow is of no help —no kidding, that is a big deal—, and there are few libraries in Github.

A small community also comes with some upsides: people are more motivated and willing to help, centralized tools like forums and IRC channels are still manageable, and you may even suggest changes to the language for upcoming versions.

What is Elixir for?

I had a middle school teacher who said that you can't define something by stating what is't not. However, in programming, mentioning use cases which are not suitable for the language is a good way to start.

Elixir is probably not the first choice for single core software: math calculus, CPU-intensive apps or desktop applications. Since it's very high level, systems programming is also out of the picture.

Elixir is great for web applications, standalone or using the Phoenix framework —Elixir's Rails—. It really shines for building highly scalable, fault-tolerant network applications, like chats, telecommunications or generic web services.

Why is that? Thanks to the Erlang VM, processes are really tiny, each one is garbage collected with a low latency, they communicate by sending location-independent messages over the network using the VMs (you can run result = Machine2.Module.function(params) on Machine1), and spawning and managing these processes is effortless thanks to some of its abstractions.

Finally, Elixir's basic modules also shine: Plug and Router for managing HTTP requests, Ecto for relational databases and ETS and Mnesia for distributed in-memory databases.

Many recommend Elixir if only for Phoenix, but I found that for most backend applications it is enough to use Plug and Router. Phoenix is impressive but I believe it's a mistake to jump right into it without trying the base modules first, so my recommendation for beginners is to hold on Phoenix until you really need it.

Elixir's novelty, the pipe operator, is a fantastic approach to working with state in a functional manner. Instead of running readlines(fopen(user_input(), "r")).uppercase().split(), try the more readable user_input |> fopen("r") |> readlines |> uppercase |> split.

It is a language which was clearly designed to stand on the shoulders of giants, while providing modern capabilities for developers.

Elixir's abstractions

To store centralized <key, value>-like data, instead of a Singleton, Elixir's provides an Agent. It keeps state in memory and many processes can access and modify it without concurrency issues.

The language can spawn processes much like threads, using spawn_link, but you probably don't want to do that. You'd rather use a Task, which is basically async/await, or a Gen(eric)Server, a very cool abstraction that receives requests from other processes, spawns helper mini-servers and processes the results in parallel, for free.

All tasks can be controlled using the Supervisor, which holds other abstractions as its "children" and automatically restarts them when they crash.

Finally, your code is contained inside a single project which can manage different apps, with modules that hold functions. No packages, no classes, no objects. Modules, functions, structs and basic data types.

Dependency management is straightforward thanks to mix; builds and testing are handled by mix too. As opposed to other multi-tools like gradle, this one is really fast.

Is that too much to process? I felt that at first, too. Give it some time and your brain will eventually think in terms of Supervisors which manage GenServers which spawn Agents and Tasks when needed.

Let it crash

Elixir's mantra is to let processes crash. I found it shocking and counter-intuitive, but with some explanation it makes a lot of sense.

Neither developers want their code to crash nor Elixir promotes writing bad code. However, let's agree that there are many reasons besides bad programming which can make a software crash. If we have a server which runs stuff and at some point we have, say, 100 connections every second, one might crash eventually because of a bug in any component, hardware issues, a cosmic ray, or Murphy's law.

The question is: in the event of an unfortunate, unavoidable crash, how will your system react?

  1. Bring everything down?
  2. Try to capture the error and recover?
  3. Kill the crashed process and launch another one in its place?

For example, C uses approach 1. Most modern languages with Exceptions like Java and Python use 2. Elixir uses 3. This is not suitable for all environments, but it is perfect for those use cases which fit Elixir: concurrent network processes.

With Elixir, a single failure never brings the system down. What's more, it automatically restarts the crashed process, so the client can instantly retry and, unless there is a reproducible bug in your code, the fresh process will finish without an issue.

The bottom line is: a single client may be unlucky and crash at some point, but the rest of the system will never notice.

How to start?

Let's get our hands dirty. After reading many sites, watching hours of video and following a dozen tutorials, here are the resources I found the most valuable. I'd suggest following this order.

Getting started

  1. Madrid Elixir Meetup 2016-03. If you understand Spanish, this is the best intro to Elixir. Otherwise, watch All aboard the Elixir Express! which is a bit outdated but very comprehensive.
  2. Official "Getting Started" guide. It's the best and the most current. Follow it from start to finish, including the advanced chapters.
  3. Elixir School. A nice complement to the official guide. Most things are very similar, but the different approach on OTP will help you understand it better.
  4. Understanding Elixir's GenServer and Elixir's supervisors, a conceptual understanding are two short reads with yet another explanation of OTP features.
  5. Elixir Cheat Sheet. The best one out there

First projects

  1. vim-elixir-ide. Elixir support for vim, not the best plugin but suitable for beginners.
  2. Elixir examples. The Elixir guide covers all these, but it's handy to have common idioms on a single page: "string to list", "concatenate list", "optional function parameters", etc.
  3. Portal Game by José Valim. A complement to the sample project on the official guide.
  4. Elixir Koans and Exercism are mini exercises that you can use to improve your Elixir agility. On the same line, Elixir Golf proposes weekly puzzles to solve.
  5. Learning Elixir. Joseph Kain has a ton of content with mini projects and examples you can follow. Top quality.
  6. Excasts and Elixir sips have short screencasts that you can check out for reference
  7. ElixirConf videos contain very interesting talks which may be overwhelming for beginners, but are worth a look later on.
  8. Install Elixir and Phoenix on OSX. If you want to use Phoenix on OSX, you may need this help
  9. Phoenix Official Guide. Phoenix isn't necessary for simple web services, you can use Plug. But for large projects you'll need a framework. Nothing like the official guide.

Getting help

  1. Awesome Elixir. A list of Elixir resources, where I found many of these.
  2. Elixir Tip and Elixir Status regularly link to Elixir-related articles and videos, and Plataformatec Elixir posts is where the language authors share news and tips.
  3. If you have questions about code, try the Elixir forum first, the IRC channel or Slack. The developers would like to transition all help requests out of the Mailing list, which you can use for language-related discussions.
  4. /r/elixir if you're into Reddit

Closing thoughts

I think that's all for the moment. I hope this post can help some beginners to get their hands on the language and start writing production code as soon as possible.

For anyone who wants to know what's all the Elixir fuss about, it's difficult to explain, especially for somebody like me who has been programming in imperative languages all his life.

When I recommended Elixir to a friend, he replied, "A highly concurrent, functional language using the Erlang VM? Don't you have something more exotic?". That's right. Elixir is exotic and use-case specific.

Unlike Python, which is my favorite imperative language and ecosystem, I can't recommend Elixir for everyone. Not everybody can spare a couple weeks to get started. Many libraries for common use cases are missing: there is nothing equivalent to Numpy or Matplotlib, and modern applications are built on top of dozens of libs, not everyone has the time or will to write library code. Fortunately, at Paradoxa I am my own boss and I make the tech decisions :)

For hackers or tinkerers it's definitely worth a look, it "won't change your perspective" like Lisp, but it will make you see that writing concurrent code doesn't need to be difficult, and that better tooling is definitely possible.

I bet Elixir will be the foundation of most devops stacks in a few years, when developers realize that the future's bottleneck won't be the CPU, but rather the number of concurrent processes and connections your backend can manage. With Elixir you only need to boot another machine in your network and let the exotic Erlang VM handle the rest.

Tags: programming, learning

Comments? Tweet  

The best programming font

June 16, 2015 — Carlos Fenollosa

Us programmers like to customize our programming environment to the maximum. If arguing about text editors and customizing your .bashrc weren't enough, we also modify a 20 year old Apple Extended II keyboard to change its keyswitch tone, remap our keyboard layout to redefine the CapsLock key, and of course decide on which programming language to use for our projects.

For those who really like customization, however, there are more aspects to it. One of those is, of course, the programming font choice. Leaving aside the fact that unless you're using a monospaced font you're a monster, some people like the classics, like Courier (New), others use the default ones, and some of us really like the retro visuals and opt for one of the nostalgic typefaces.

My favorite one is DOS/EGA, by Mateusz Viste. Just make sure that your text editor supports rendering typefaces without anti aliasing and that you don't need many non-ASCII characters. Most are implemented but some editors screw either the line size or the kerning and make text look very ugly.

There are other versions of the same font, but Mateusz's is the best one and has the least annoyances.

I don't use it everywhere because of rendering problems with some IDEs, but OSX's terminal seems to handle them well, and it plays perfectly with a black background. Truly retro but, I think, an excellent programming font.

DOS/EGA in action

Tags: programming, retro

Comments? Tweet  

Craftsmanship

January 10, 2015 — Carlos Fenollosa

As an engineer, I enjoy fixing things, disassembling gear and learning how it works. I'm sure many of you opened watches, pens or other small electronics as a child, only to find that there is "that extra piece" after reassembling them.

My grandfather was an artisan, and I still have some of his pieces. They are sensational and reflect not only the image he was trying to create, but also his general mood, recognizable by observing the strokes, edges and paths that the gouges followed. That's why I prefer looking at an artist's collection rather than a compilation, it's easy and shocking how they—and their environment—evolve over time by looking at their works.

For me, however, fixing a Game Boy does not resemble creating a beautiful sculpture from a piece of wood. Basically, I've never had the manual ability nor the tools to build real-world things from scratch, and that's a skill I envy. Fixing or improving existing things à la McGyver is one thing, but producing something from scratch is an entirely different expertise. That's why I love looking at other people's projects at the DIY subreddit.

But, in some way, programmers are DIY masters, especially for small weekend projects meant to scratch an itch. In the zeros and ones world, crafting can mean writing a script to automatically parse gmail for youtube links you send to yourself from your cell phone, downloading them in the background, converting them to mp4 and saving them into iTunes, so that they automatically get pushed to your iPad and are ready to watch when at home.

Not only the result may be notable, it also carries a little bit of yourself in the code. Which language did you use? Did you feel lazy and wrote it in Bash, or wanted it to be robust and wrote some Python? Is it all on a single function or split into modules? How did you name the variables? Are there comments and documentation or is it a quick hack?

Comparing code to other art forms is a cliché, granted. But when I look at my /usr/local/bin/ folder, I don't only see some scripts, I also see my own evolution regarding programming languages, project ideas, organization, skill, and why not, my attitude.

Ah! I see that around 2011 I stopped using backticks in Bash scripts and replaced them with command substitution. Then in 2012 Python scripts start replacing Bash. Around that time I learned Git, because I can see a lot of helper scripts there. I guess I didn't like it back then since I see a lot of "wtf" comments.

There are some unfinished projects. I know because the files aren't executable. Upon opening them, yes, the code is only half written. Will I pick them up someday? I really don't know, but I love remembering that three years ago I was working on an audio synchronization project. I had completely forgotten.

A programmer won't likely get his code displayed at an art gallery, it probably wouldn't even make sense except for a few other programmers. But observing the evolution of your work is a good self-reflection exercise. Reading other people's code is probably a must, but reading your own, even if it makes you cringe a bit—God, was I really using Hungarian notation?— will let you see how far you've come. It's like looking at that famous Microsoft staff in 1978 picture. A bit shameful, but necessary to let us all remember that even Bill Gates worked his way up from the bottom.

Take some time to look at your crafts from some years ago, and take pride in them. They may not be a beautiful table or lamp, but they treasure a little bit of yourself from the past.

Tags: art, programming

Comments? Tweet  

You only do it when nobody else will do it

October 03, 2014 — Carlos Fenollosa

Maybe the difference between a junior and a senior programmer is that the first will sometimes say "I don't know how to do this", while the second will always say "Give me a week".

When you finish college with a computer engineering degree, everything seems possible. You just learned how to design a computer from the zeros and ones up to the applications. From logic gates up to a CPU, from TCP to HTTP, from assembler to Java.

Then, time passes, you get a regular job, and regardless of its awesomeness you start forgetting stuff. Furthermore, you discover super smart people who are light years ahead of you, and for some reason, your mind thinks of them as superheroes, almost magical creatures who can write an ultrafast x86 emulator or make a disk drive play the Imperial March

Don't get me wrong, these are amazing feats. But psychologically you start to feel dumber and dumber up to the point where you believe that the only thing you can aspire to is write webpages and some normal stuff. Even if you have great skills and do a great job at a great company, it's difficult not to feel just a tiny peg of the machine.

I didn't know how to do anything else, and I thought I'd fail if I tried. In college, I suffered a lot with some courses, and to date I still don't know how I passed. But the truth is that college is very dense, and without all the stress from exams and projects, and thanks to age and experience, things actually get easier to learn.

That's why side projects are important.

Three years ago I launched my first successful project to the Internet, bashblog. It's no big deal, but it's a commitment. People use it, contribute patches, discuss ideas, and I have the responsibility to make it work.

Then I started learning things that have always tickled my curiosity. It started with functional programming, one of the academic topics which has been discussed since the 60s but never took off. Then I did more courses on astrophysics and statistics.

This year I left my job to take a sabbatical and start new projects. In some countries it's normal to take a sabbatical before starting college to travel and learn. But almost nobody thinks about a sabbatical when you're 30. You can choose a wrong career path when you're 18 and fix it, but the 30s are critical and one needs to be really sure that they want to spend the rest of their life doing what they love.

I have recently found, don't know how to put it, some change of mind, new strength, inspiration. I want to learn how to write an OS. I started writing mobile apps. I want to launch a product. I'm contributing to an industrial patent to do really cool stuff with cellphones.

As usual, every of these projects hides many challenges. I've had to read RFCs, learn how to extract voice patterns from an audio file, write device drivers, deal with lawyers and read formal documents 1

We live in the information age. There are plenty of resources, some of those University-grade, to learn new skills. Discipline and planning can go a long way. There is no excuse.

Github and other websites have also made it effortless to collaborate with total strangers. It really makes me happy and proud to see other people commenting on things I've done. Years ago you had to go to a computer hobbyist meeting to show your work, now you can do it online... and others will improve it.

Stack Overflow will provide code samples and guidance. I've now started hearing undergrads utter "Did you really code programs without Stack Overflow?" in the same way that I used to say "Did you code programs without the internet?" to people who had to read manuals and go to a library.

Hacker News and Reddit can guide you on what's cool nowadays. Live in the future, then build what's missing.

I guess that it's comfortable to dismiss some ideas just because "we don't know how to do it". And that's a waste of our university degrees. We have some responsibility to do cool stuff. If we don't build it, who will? If we can build it, why wait? If you don't have the skills, learn them. Just Google it. Work on it for a week, and you will succeed.

Do you miss the adrenaline rush that you used to get when you first discovered something? The "oooh" and the "aaah"? Learn something new, something radical, something cool and futuristic. Start a project, and release it. It doesn't need to be complete.

The greatest force that pushes us to build things is the knowledge that nobody else will build them for us.

~~~~~

[1] One of the multiple things that managers usually do and engineers don't appreciate enough

Tags: programming, life

Comments? Tweet  

Chat wars

May 15, 2014 — Carlos Fenollosa

Coming in each morning to see whether the client still worked with AOL was thrilling [...] One day, I came in to see this embedded in a message from the AOL server: "HI. -MARK." It was a little communication from engineer to engineer, underneath the corporate, media, and PR worlds that were arguing over us. I felt some solidarity with him even though we were on opposing sides.

A great story from David Auerbach, who was in the original MSN Messenger team, explaining reverse-engineering AIM's protocol to be able to interoperate with them.

As someone who used Messenger for many years, the piece brought me back in time. Well written, interesting, and entertaining.

Tags: retro, programming, web

Comments? Tweet  

SQLite: a standalone database for your application

November 10, 2011 — Carlos Fenollosa

Us researchers are used to store data in plain text formats, because it's very easy to parse and work with. While this is appropriate for some data types—and, I'd add, very useful to send later to R—, in some cases disk access is slow or just inefficient.

This topic is actually very important for some projects, as storing records into a plain text file is very slow to query afterwards. And this is the key question to ask ourselves before considering to use a database. Databases are great for complex, unordered queries, but not so great for sequential access of raw data. Let's see an example.

There is a data file which stores atom coordinates, for example, from a Molecular Dynamics simulation. This data is very likely to be read once, sequentially, then processed in memory. The information represents a matrix which will be processed by mathematic functions. This is the classic example where data files (either binary of plain text) are used correctly.

But now let's think of a list of proteins and some properties, for example, molecular weight and number of available 3D structures. All these objects are independent, they have their own entity. While you can store a text file with one line per <protein, weight, structures>, it makes more sense to store it on a database.

Databases allow complex queries to be resolved very quickly. For example, give me all proteins with molecular weight > 50,000 , list all proteins which have no crystal structures, or print all the proteins which have duplicate structures. Were we working with a text file, we would need to process it completely every time we perform a query. That's very, very slow. Databases internally store the information in such a way that queries don't need to go through all elements to get the answer. Namely, they store data on trees by indices.

How do indices work? It's a complex issue but let's think of a very basic example. Let's say you have three protein structures (1BCD, 2KI5, 1AGI) which you want to index by name and molecular weight. The system will then automatically build a protein binary tree where 1BCD is the parent, the left child is 1AGI and 2KI5 is the right child. Then, it will create another tree where the left child is the lightest protein, the parent is the middle one, and the right child is the heaviest one.

If the index tree is always sorted where the left child is alphabetically inferior than the parent, and the right child is always superior to the parent, then we can access any element or group of elements not only without checking every item but also in logarithmic time. Databases do this once for every index you configure, so complex queries can be solved super fast because for each of them the system only needs to process a few items of the many millions you might have stored in the DB. That's because every time you jump to a child element, the system is avoiding to process half of the database, then a half of this half (1/4), then 1/8, etc.

To summarize, if you have some data where each record has its own entity (i.e. can be thought of as an "object") and you think you'll make queries which retrieve an arbitrary number of the elements, then you need to use a database. Databases have even more advantages, like relationships between objects (e.g. each crystal structure has its own entity, and can be related to a protein), but database design is a complex topic and this article will cover only the basic data storage.

However, databases are usually configured by the system administrator and handled by a daemon—oracle, mysql, postgresql. Here I will talk about yet another way of creating databases, without the need to start any daemons, have any user privileges and, more importantly, easily portable. This is done via sqlite.

SQLite is a library that implements a SQL engine inside your own application. This means that while the database is persistent inside a file, all the querying infrastructure is deployed along with your code and stopped when the code finishes running. The databases can be created very easily, making it easy to have multiple DBs for testing, and without the need to bother the system administrator.

sqlite has bindings for almost all popular languages and also a commandline interface which is handy for testing and debugging. The data is stored on a single file which can be deployed with your application without needing to install any standalone servers. Obviously, it is not a replacement for Oracle's solutions, but it can speed up a lot some applications which need to work querying data and don't have access to one.

Most popular software uses some kind of database to store data, as this is a super fast way to access preferences and other items. For scientific programs, it is always necessary to think twice before using one, as database design is an art on its own, and as said before, it does not suit all needs.

When used properly, a small ad-hoc database like sqlite can speed up software, make data access very easy and allow the manipulation of large, objectified, in-related data collections with simple queries instead of writing long and slow algorithms which process all the data when you only need one item.

Tags: software, programming

Comments? Tweet  

Which is the best programming language?

September 23, 2011 — Carlos Fenollosa

This classic question from beginners who start coding their own tools for research has only one correct answer: it depends.

If there is no language which is clearly better than the others, why do I have this very simple table on my Unix section?

  1. Bash, awk are your first choice
  2. PHP if you require more power
  3. C only if you know why
  4. Use Java & Eclipse

Don't get me wrong, this table has been compiled from quite a few years of experience, and there are huge assumptions behind it. The first one is that you still don't know what language you should use. If there is no doubt, either because there are some requirements for the project or because there is a language that is designed specifically for that task (e.g. CLIPS for declarative programming), then you need to ask yourself some questions.

I'll start by enumerating the most popular language choices and some of their features

Scripting languages

Scripting languages are good for small or medium projects, because they fit very well with the line of thought of the programmer. This means you can program while you think, which isn't the best for cleanliness, but it gets the work done quickly

  • bash is always your first choice. You already know how to run stuff in the command line, right? So this is basically the same. bash can handle functions and arrays, but that's pretty much all it can do. However, that's usually good enough for small routines, and you can always call other Unix tools. It also avoids the overhead of running another binary (perl, php) as it is already in memory.
  • perl is great to parse text, but slow for anything else. If you need to parse text and do math, use php, which has a faster math engine. It also lacks objects. However, there are very good scientific libraries for it, so you might be forced to use this language anyway.
  • php has nice libraries to connect to databases and in general do web stuff. It is also object oriented, so it is a suitable candidate for small-medium projects which can benefit from object orientation but don't need all the infrastructure from java or C++. In general, unless you are tied to perl, php is a better choice.
  • python is, well, another scripting language. It's way better than perl, and functionally similar to php, so you might want to use it if you like its clean syntax or need to call other python libraries.
    Edit: after having used python for a long time, it is the first language I'd recommend for most use cases
  • ruby is so painfully slow that you should avoid it at any costs. I am including it here only to warn other people against using it.

Compiled languages

Once you start compiling code, things get complicated. However, the results are usually great, fast, and very maintainable. Let's discuss the alternatives.

  • java why java first? Because it's the most appropriate. It has great developing tools (Eclipse), it checks a lot of stuff in compile time, it does not need that the programmer uses pointers—it uses pointers internally, but transparently—and in general is a modern, object-oriented language, which doesn't require legacy stuff like headers. Yes, it is a bit slower than pure C, but the latest versions of the java virtual machine compile to machine code on runtime and achieve great performance. Most computer scientists have mastered it and in general it is widely extended. It is versatile and can be used from simple routines to implement web pages with JSP, to CRMs. Yes, I like java.
  • C is the mother of all programming languages, but this does not mean that it's the best one. It's old, doesn't have objects, and for every byte optimization which earns 1 second of execution time, the programmer needs to waste ten minutes. Optimizations should be done at the compiler level, not the code level. However, C has great compilers, from the good-enough gcc to the awesome icc.
    My recommendation is that you use it only if you know what you're doing. It's awful to parse strings in C, it lacks many scientific libraries compared to perl or java—except math functions, but that's what R is for—and the segmentation faults in general can make you waste several days looking into the code because you declared a variable incorrectly and you have a pagination issue.
    Some might argue that C is as good as the programmer is, but honestly, it makes good programmers waste a lot of time because of small issues.
  • C++ is the alternative if you need to use C in an object-oriented environment. The compiler is also able to run more checks in compile time, so you'll waste less time, but I'd go for java anyway. There is no reason to choose C++ a priori but execution speed
  • objective-C Apple users are sometimes tempted to write obj-C code, because of the excellent development tools on a Mac, but keep in mind that probably there is nobody else who can look at that code afterwards and understand it, because almost nobody uses obj-C. So I'd suggest not to use it unless you're in a hardcore Mac environment or are planning to develop a GUI for a Mac afterwards.
  • fortran there are only two kinds of people who use fortran: physicists and the poor fellows who have to maintain their code afterwards. It was designed for the 50s computers, which means that using it nowadays would be like using steering wheels from John Ford on today's cars. It is easier to understand f2c generated code than the original one. There is not a single reason to use fortran. If you need raw speed use C. If you want to write unmaintainable code, well, use obfuscated perl.

Choosing a language

Now for the difficult task of choosing a language. If you look again at the four items on top of this post, having read the language descriptions above, you might start to see what's going on. This is basically a matter of choosing the right language for your specific task, with some decisions.

Which are my time constraints? Beginner programmers often forget that, for homemade software, the total time constraint is the time you spend programming plus the time you spend running it. If the routine is expected to take 10 minutes, don't waste two hours writing a C program with pointers, write a simple script.

Can I solve it with a simple script? If the answer is yes, use bash. It's a great scripting language, and you can build on top of other Unix tools, like awk, sed, etc.

However, keep in mind that every time you call an external program, the system needs to fork() and, for large loops, this can be a huge overhead. Be rational, and think again of the execution time. Instead to launch 10,000 sed to parse lines, it might be better to write a php script, which is more powerful than bash, it won't need to fork() and the code will probably be simpler.

Will I need to maintain it or reuse the code? Will the code grow? If you think this code can be reused as a library, or integrated into other modules, think of making it into an actual C library or java class. Running scripts within scripts within a big project is generally a bad idea. And please keep in mind that, in a research environment, at some point another person will need to look at your code, so besides writing clean and understandable code, try not to use obscure languages or tools which only you know of.

Do I need to achieve the maximum speed and/or optimizations? Keeping in mind that the latest versions of the java virtual machine are pretty fast, yes, the winner here is C. But we're talking about software which can take you two weeks to code, and which would take months to run if written in perl, but only takes three hours when coded in C. When this happens, choose C.

Will it need to run on different platforms? java and the scripting languages are the only ones which guarantee perfect execution on every environment: Windows, Mac, Linux, Solaris, BSD and others. C can be compiled in different architectures but it's sometimes hard to replace mmap's on Windows or compile against different versions of the libc in different flavors of Linux.

Summary

Let's review the four initial points again.

  1. bash is a great initial choice for small projects which will take about 20 minutes to run and you don't want to waste three hours programming them
  2. php is appropriate for medium projects, which use objects, parse text and do math. perl is another good choice at this point.
  3. C is better left for experts or people who need hardcore optimizations. The rest of us will leave optimizations for the compiler/interpreter and just try to write good code which runs in O(n) if possible.
  4. java is the king of tools and libraries, multi-platform, scales great for big projects, is surprisingly fast and very respectful with novices. Its only drawback is the need of a java virtual machine, but hey, if you use perl you will need its libraries installed, too.

In the end, everyone has their preferred languages, which is fine. It is far more important to write good code than it is to choose the language which fits best for a task. However, failing to foresee the importance of a math routine in the and writing it in perl can lead to the whole research group wasting time until somebody else writes it in C and makes it 1000x faster. Yep, true story. So choose wisely.

Tags: programming

Comments? Tweet