2017. július 27., csütörtök

The case against automatic memory management (a.k.a. Garbage Collection)

The state of the memory...

Microservices are good for scalability, because they are small and if one of your servers become hot, you get the option to move it around. They solve the smallest meaningful problem and communicate nicely with each other. CPU, memory and other hw related issues are things from the past since we have our machine learning based automated service deployment utility. Life is good... for most of us. Some are cursed with problems less easily "micro-serviced".
Consider the world's arguably easiest-to-parallelize problem: ray-tracing. In ray tracing you can potentially calculate every pixel with a separate service and combine it to the whole picture at the end of the processing. The problem is, ideally you load all the different components of the scene (objects, textures, etc.) to each service's memory. If you are hard pressed, you can start with the wireframe models and load textures on demand. Optimizations exists (i.e. bounding boxes instead of complete objects), but at the end of the day you need to know what the current ray hits with some precision. A complicated scheme might cause out of memory problems on all of your rendering microservices at the same time.

... and the way we manage it

Considering the top 10 programming languages from Tiobe Index we can see 75% of the most popular languages utilizes some kind of memory management works independently from what we specify in the code. The notable exception is C and C++, where garbage collection is available, but optional part of life: the C++11 specification allows for optional GC mechanism.
This means 75% of the cases we let somebody else's code take care of the memory, while we are becoming increasingly particular about what the CPU supposed to be doing.
Considering how easily a memory problem turns into a CPU problem due to increased GC activity it looks like we tend to play favorite with the CPU and trying to ignore memory as much as we can.

Any better way to do this?

I've been playing with Rust for quite some time now - nothing valuable just some hello world and dice rolling. Rust is not an easy language to begin with, lots of things done in a completely different ways than i.e. Java. Yes, memory management is one of them.
At its heart all memory manager (or garbage collector) is a reference counter. We say: "I don't care about who owns this stuff, figure out when it is not being used and get rid of it" and the GC tries to do it as unobtrusively as possible.
In Rust, you do have to figure out who is the owner of your data (variables, references). When you refer to the same data with a new variable, you have to explicitly tell if you want to transfer ownership (R/W access) to the variable. If yes, the old one is not going to be available!
This has significant advantages not only in sense of memory management (when owner variable goes out of scope, data will be deleted). It helps a lot in multi-threading with data sharing as well. Combined with the deterministic GC behavior, in Rust you know what's going on with memory. It is less comfortable than outsourcing it to some other thread, but definitely more dependable.




2017. július 9., vasárnap

How to contribute to github projects?

Just another "note to self" type entry about open source contribution.
  1. Find a good, welcoming project i.e. here: http://up-for-grabs.net/#/
  2. Fork the GitHub repo (let's call it upstream repo from now on) to your local repositories: on GitHub, press the fork button. Now you have a copy of the source code
  3. Get the code down to your developer machine
    1. To obtain clone url, go to your repo's page on GitHub and click "clone or download"
    2. git clone https://github.com/vizmi/sequelize.git
  4. Now you have you nice, isolated copy of the original repo. Time to sync it up with the upstream repo
    1. Navigate to the original repo (there's a link to it under the project name)
    2. Get the clone url (click "clone or download")
    3. Set-Location .\sequelize\
    4. git remote -v show the currently configured remotes. Ideally you have 2, pointing to your own repo. We are about to add 2 more, with one single command
    5. remote add upstream https://github.com/sequelize/sequelize.git
  5. Time to sync the local repo to the upstream repo (to avoid merge conflicts later)
    1. git fetch upstream
    2. git checkout master
    3. git merge upstream/master