It’s common knowledge that DBMS is a very important part of a fast scalable distributed systems as far as performance is concerned. But I often tend to forget it and blindly rely on well known relational DBMS. Recently I stumble upon the following other DBMS/discussions and the last one really opened my eyes:

  • The Erlang/OTP come with mnesia allowing transaction on distributed DB. However it has been argue (note: this need some reference…) that distributed mnesia cannot scale (it needs some agreement between all nodes for transaction). For huge distributed system (or nodes on low network) the application has to maintain data consistency without relying on transaction. Berkeley DB seems to be the preferred way to go for this, but as I understand it, it is not a DBMS which manage distribution but a tool to help you to do so.
  • CouchDB is one DBMS trying to handle the distribution of DB.
  • Also I just discover the terminology column-oriented DBMS through this (too?) simple wikipedia article. Being used to R statistics system helped me to (try to) understand what it really is. At least I better understand why Google developed BigTable and I have to read or listen more about it …