mySQL, noSQL, and Key Value datastores

Monolithic RDBMs are losing ground to key-value data stores, particularly persistent distributed in nature. mySQL mounting problems was perhaps the key reason (pun intended) people looked elsewhere. Google's brilliant engineers realized that a key/value data model can satisfy the needs of almost every class of application that needs a datastore backend.

Key/value datastores are simple to build, easy to understand, easy to optimize, easy to scale. The, now famous, CAP theorem states that it is not practically possible to guarantee consistency, availability and partitioning resilience/tolerance all at the same time; one of those traits has to be sacrificed. Again, most applications really do not require all three to function. The CAP theorem is most likely derived from the Project Triangle mode.

Most web-based applications are built on simple data models. Most web-applications eventually suffer from service capacity and availability issues(i.e scalability woes). It is trivial to scale out(vertically) application logic processing(application servers), HTTP requests processing(web servers, load balancers).
It is not easy to scale out an RDBMS. Some expensive systems(Oracle, etc) provide ways to address those issues (e.g Oracle RAC) but its expensive to deploy them, and most of them rely on a shared everything setup which just doesn't work in the long run. (Shared nothing is really the way to go).

Google released a bunch of papers ( actually, a bazillion of papers ), many of them defining and shaping the development of future related technologies. Namely, the papers describing GFS, BigTable, MapReduce (and of course, the paper the changed everything, "The Anatomy of a Large-Scale Hypertextual Web Search Engine" ) steered everyone to the right direction.

In the datastores domain, Hadoop/HBase, Radix, Cassandra and others, based on BigTable and Amazon's Dynamo papers, all relying on the simple key/value datastore model, are gaining market share - rightly so. Coupled with Memmache and similar services(in-memory key/value stores) they are solving the problems of service capacity and availability. This is a paradigm shift. Its a downhill for heavy-footprint, complex and inflexible datastore systems. They wont go away but will not be such a valuable(pun intended) component in tomorrow's technology landscape.

We are going to gradually migrate from RDBMs - though, we are not relying that much on them nowadays - to a key/value datastore (we are currently building one, also based on BigTable and Dynamo ). If nothing else, those simple systems are both simple and beautiful (for the most part).

Ideas for iPhone applications

I thought of two ideas for iPhone applications yesterday and I thought I would share it with anyone interested in pursuing the tasks. I may do it myself if time and motivation allows it, but do feel free to try your luck with them. So here goes nothing.

Life-tracker

The main principle idea is that you could use your iphone to keep track of your life through time, specifically where you have been, what have you been doing and what your thoughts on any given date. The said application would be really simple to use. You launch it, two buttons will make it possible to record your existing location (geolocation coordinates) and/or record thoughts or, say, what it is that you are doing at the moment. You can do that as many times as you wish, whenever you wish. Sometime later, you can sync all that with a web-based service. You could access your life activities through that service ( what have been thinking a year ago this very day? where have I been last week when I was on vacation in London? you want that on a google map - there you go) and so on, so forth. A more or less trivial application to build.

Javascript driven native iPhone applications

This is a no brainer, in fact I wonder why someone ( Apple even ) hasn't thought of it yet. One can expose the iPhone functionality(framework facilities) through Javascript (Javascript objects), have a simple runtime application that 'all' it does is act as the VM/runtime for javascript code. 'Everyone' knows Javascript, everyone(?) likes Javascript, why not make it possible to build real (i.e not hosted on Safari, web-apps ), native applications using the language? A developer would still submit an iPhone application ( the Javascript VM/runtime, with the javascript files and resources in the bundle ) to Apple, Apple, nor the users, would be able to tell the difference. Hey presto, a gazillion apps flood the App store - most will be crappy ( the nature of things ) some will turn out to be gems. If I could make it possible for my brother and my fellow javascript gurus at work to build any app they want as easy as they build our web-apps, that would be kinda cool. Here is what it could look like:

var myButton = new UButton();
myButton.text = "Hello World";
myButton.addListener('click', function(event) 
{ 
alert('Your geolocation is:"+(new ULocation()).toString(); 
}));
thisWindow.containerView.addView(myButton);
or something.
Random thoughts produced on the balcony

Terry Pratchett on the right to die : Pratchett is one my favorite authors. His books ooze hilarity. He seems an all around awesome person, all things considered, too. Recently, he was diagnosed with Alzheimer's and he seems brave(?) enough to wish to die before this treacherous disease disrupts his mental abilities.

Every single time I need to use Windows, it feels wrong in a profound way. Every single time.

Paypal now available for US Xbox Live Accounts : A great idea, I hope it wont be long before this option becomes available for everyone else, too. Dealing with Microsoft points is an often laborious, not to mention futile, process. You would think MSFT would make it easy for us to give them money.

Check out the water crisis article on wikipedia and HowStuffWorks's why can't manufacture water? topic.

I got to spend some more time on the SGL revision, mentioned in previous post. The Garbage Collector (initially a mark and sweep based facility with a single heap for objects allocation, but will experiment with tri-colour marking and generation based segmentation (i.e generational GC) later on ) is almost in place, the main runtime component seems to be operating as expected(values are stored in 'registers', no stack manipulation, will use thread dispatching using GCC's goto *pointer extension ). I went through the WebKit's JavascriptCore source code for ideas. Came up with lots of notes and concepts that I want to toy with, if time and motivation permits. I am looking forward to diving into Google's V8's source code next week. I am afraid I will have to freeze the project in 2 weeks though and switch gears to work on our new file system (PFS, for lack of a better name), based on design decisions applied on Gooogle's GFS and Amazon.com's Domino.

I am somewhat let down by myself; over 5 projects are left in a semi-complete state. I need to focus on one at a time, wrap it up and move to the next one. Its just that August is mostly about trying out new stuff and research. Its the least demanding month of the year for us, most of the folks at work are away on their much deserved vacations. Little to no sound is produced ( I can't deal with sound, laughter, yelling, phone calls, you name it ) and that does wonders for my (currently degraded, due to Summer) productivity.

On Javascript and simplicity

If you still have doubts about Javascript: becomingbeing the most popular programming languge, its probably because you are not exposed enough to the web-based Applications paradigm shift efffects.

Not only there seems to be more javascript code (in terms of sheer volume) out there, its also about the number of users using applications that are driven by it, most of them not really knowing, or wanting to know, what it is, but thats an entirely different story for someone to tell, again.

We are relying on 4 primary programming languages. C/C++ for backend 'stuff', PHP, Javascript and SGL for frontend/light-weight 'stuff'. Well, we do use bash scripting for _so_ much systems and operations 'stuff', some python and perl here and there, as well as some java and Flash/AS3 for more frontend 'stuff'.

SGL is our home-grown programming language, it stands for Switch Glue Language, Switch being the main framework/library everything - all services, tools, other libraries, etc - are based on. The idea is that we can use this language anywhere we want to script operations and 'glue' things(services, resources, operations, etc) together. Currently, its used for two major services.

Our frontend developers eventually have to learn, or at least get familiar with, all those three main frontend languages, PHP, Javascript, SGL. Interestingly enough, Javascript code output surpassed PHP output, in terms of volume, mostly because our apps got more functional, fancy, whatever cool bang you get from client-side logic on the browser -- I wouldn't know really, I don't know much about frontend development, our main frontend team do though and that's all that matters (partial unordered list: phaistonian, hatdi, sug, stelabouras).

Given that SGL has been long due for a rewrite ( the currenty language syntax and semantics ), I thought I put aside some time to rwrite it, this time around using Javascript language syntax and semantics so that, when its ready, we could replace PHP with SGL thus, effectively, switching from 3 frontend languages to just one. Our developers, current and future ones, would only need to learn a single language, which may be the greatest benefit to this shift, but it sure is not the only one.

This will be my third attempt to writing a programming language ( SGL being the second, PASTE was the first.. those were the days) and thanks to Javascript being a standard, its a 'simple' enough matter of writing an efficient enough VM that will run the emitted bytecode ( I am toying with the idea of being able to target PHP and other languages, eventually, generating - say, PHP code from SGL code and so on ).

So far the lexer, most of the parser and some parts of the VM are in place. Hopefully, there will be enough time and sustained motivation to keep this going (its a side project, so it can't really preempt current major Phaistos projects ) until its ready, perhaps by the end of the month.

Simplicity is the ultimate sophistication- Leonardo da Vinci

Go for the eyes, Boo!

My (by far) most awaited movie release is Tarantino's new film 'Inglourious Basterds', coming out in August. Check the latest trailers. Pitt's talent in bringing to life the badass character Tarantino penned, coupled with the overall theme makes for an exploding, action-packed funny movie, the way I see it. I can't wait to watch it.

Google Says Mobile App Stores Have No Future : Unless web-based applications offer the same kind of functionality native applications can and do offer, I don't see how everything will be moved off to the Web. It may make sense for a class of applications, but, realistically speaking - there are a few hoops that are just too difficult to jump over for this to come to pass. Palm is betting big on this concept, now, with Pre and its Mojo SDK - which is really not working out for them. Maybe in a few years when the underlying hardware and operating system services will be, somehow, become available/exposed to remotely executed apps it will make more sense to expect this to be the case.

Serious Doubts : Marco Arment argues it is hard or next to impossible to run a serious business of writing iPhone applications. I still haven't developed a 'real' iPhone app, though I wouldn't really do it for the money, not that I would mind getting any in the long run. As far as I am concerned, those digital distribution systems solve, well, the physical distribution problems(which are many) and facilitate access to content that has traditionally been hard to access. Everything else is a byproduct of competition and demand.

Now that Monkey Island Special Edition and the first episode of Tales of Monkey Island are out, my new most awaited game is Bioware's Dragon Age : Origins. I have high hopes for the spiritual successor to Baldur's Gate, not the least concerned with the somewhat negative previews so far( people seem to dislike the 'adult' content in the game). It will ve possible ( much like it was the case with Baldur's Gate and other Infinity Engine games ) to issue orders to party members in real time, or pause and queue up actions that will be carried once upon unpausing the game. As far as I am concerned, this is the best control scheme for CRPGs ever devised. Who knows, maybe Minsc will make a surprise cameo appearance. "Go for the eyes, Boo! Go for the eyes!"

I want to be a pirate!

I downloaded and played The Secret of Monkey Island : Special Edition on Xbox Live Arcade. It took less than few minutes to download the 500+ MB demo and no more than 10 seconds after I launched it to be excited. The classic intro screen was on display with the classic - much beloved - monkey island music theme(youtube video) and then it faded out to the new and improved screen, spotlighting the most exciting enhancement to the remake.

The folks at Lucasarts pulled it off; it is the same wonderful game we 'all' know and love, outfitted with gorgeous hand drawn graphics, dialogue voice-overs and a user interface that actually works on the consoles.

I wish I could purchase the full game, alas, the whole Microsoft Points deal is messy due to the fact that Xbox Live is not 'officially' supported in Greece and the implications related to credit cards processing. Links: Review on Gamespy, Guybrush Threepwood

I turned on my dekstop PC tonight in order to look for Trine on Steam. I came across a review on Giantbomb and was really impressed by the premise and the visuals of the game. Some friends have played it and had nothing but words of praise for it. Hopefully, I will play for a while this weekend. Speaking of my PC, in retrospect, it was one of the least meaninful purchases I ever made. My intentions were to install Linux so that I could work on the Linux Kernel and prepare for Larrabee's arrival so that I would eventually get to develop for it ( I am very excited about Larrabee if previous posts haven't made it clear by now ). Turns out, I only use my beloved MBP anyway for I have so little time to spare lately. Oh, well.

Being Amiga users back in the day was a lot like choosing a platform with a soul over PCs. I feel the same away about using Apple products (especially OS X). It feels great, it feels right, it feels much like it felt when I was using our Amiga. Whenever I have to use Windows ( thankfully, not often nowadays ) it feels wrong in so many ways. Come to think of it, the only thing I like about Windows is the Win32 API

Just like every summer, I am going through a huge productivity slump. I can't wait for Autumn. This time around I intent to try to deal with it though.

On important technologies: LLVM, CocoaTouch, Caching, Multi-core designs

As far as I am concerned, LLVM, CocoaTouch and memory based cache servers make up the the set of software technologies that will affect all things computing next year onwards.

LLVM is going to push code compilation and optimization to the next level. Building a new language is borderline trivial using LLVM technology. You produce the IR and the LLVM backend takes care of everything for you. I will be surprised f the 'fastest' Javascript implementation for 2009 won't be based on LLVM.

Apple's CocoaTouch is so well done, so well thought out (we are still on iPone SDK 1.0 and that speaks volumes) that it will be hard not to imagine Apple advancing and reusing the technology on, say, tablets and even making available the CocoaTouch extensions to OS X existing frameworks features set.

The ever increasing complexity of web-based services along with the rising number of users of those services and the need to sustain a user experience that depends on responding to user's actions as fast as possible, calls for the kind of tools and services that utilize intelligent RAM based caching. By caching just about everything, thanks to the ratio of reads/writes, gets/puts, resources (CPU, disks, etc) use drops by orders of magnitude while at the same time satisfying the need for a perceived fast responses to a matching requests. Developers and researchers most likely will come up with even better systems, ones that deal with cache coherency transparently, mirroring and synchronization, etc. This new realization may even render expensive, large and over-complicated systems irrelevant(e.g Oracle RDBMS). The only potential problem is the saturation of the network links, which is another class of problems researchers should look into in the near future.

On the CPUs side, everyone seems to have finally agreed that we can no longer scale vertically. We have to scale horizontally by exploiting parallelization and multi-core designs, potentially coupled with technologies such as NUMA. The Cell processor, Sun's UltraSPARC T, Intel's and AMD Multi-Core designs are build on that principles. Using all those cores('threads') efficiently, both in therms of throughput, scheduling and access to system resources is not (going to be) an easy task, but the benefits and the need to go forward justify this new approach, if not make it necessary. I personally couldn't be more excited about the possibilities availed by those architectures.

Software Rendering, Filesystems

Core i7 beats Intel IGP in DirectX 10 software rasterizer : I am very excited about the upcoming Larrabee GPU (or rather, hybrid-GPU) Intel is working on. I never found it particularly interesting to be restricted to a set of APIs for for defining and drawing scenes, as opposed to the good old days where it was all about relying on optimization techniques and clever programming to get get the most out of a pure software based rasterizer. Nowadays, at least on PCs and most game consoles, you must use either DirectX or OpenGL which provide a set of benefits (everything is taken care for you by the the GPU, the driver and the API implementation layers, etc) but also take away the fun. There is a multitude of reasons why the existing model works, but one could argue that innovation and advancement of the technology is hindered by being bound to a constrained environment and set of interfaces. I can't wait to see what near-future Carmacks, Sweeneys and Abrashes will do with the return to software rendering made possible by Larrabee and new, similar products by Nvidia and AMD.
Related references: Twilight of the GPU: an epic interview with Tim Sweeney, RAD Game Tools's advanced software rasterizer for x86, Michael Abrash, legendary x86 assembly and code optimization programmer, Software Rendering on Wikipedia, Nvidia's David Kirk on CUDA, CPUs and GPUs

Migrating to ext4 : We are looking into switching to ext4 filesystem for a few nodes on our 'testbed' environment now that ext4dev is considered stable enough to be renamed to ext4. ext3 has been sufficiently stable and performs well for our data set. Hopefully ext4 will be better in both aspects. We put XFS and ReiserFS to the test a few years ago and that didn't work out very well, though XFS, at least in paper, is impressive. Sooner or later we will need to work on our own file system, which would introduce a great number of benefits to our environment and would be fun to build.
Related references: Google File System, Lustre Filesystem

QuakeCon 07

My 'hero', John Carmack, and co. came up the kind of announcements that far outweighed the ones by Sony, Nintendo and Microsoft, combined, in the recent E3 expo, as gamesindustry.biz rightfully points out. A new game technology, a new game IP, Valve distribution deals, free, web-based, Quake III arena, xbox360 Quake, this, that and the kitchen sink. Epic Game's Unreal Engine 3 may be amazing, Crysis may look breath-taking and the recent demos of Metal Gear Solid and Killzone for the PS3 may look out of this world, but the pizza and coke powered wizard (that's JC, and, yes, its too dramatic of a metaphor ) is not to be taken lightly.

I have been scouring the Net today for videos of J.C's keynote. Alas, nowhere to be found yet. I must have watched last year's QuakeCon's keynote over 64 times. Its fundamentally cool listening to JC discussing technology, gaming and see the genius radiating out of him as he sometimes branches out to 'hard core' development issues (memory access time? procedurally generated data? .. ).

Algorithms for Programmers ( PDF reference )

Algorithms for Programmers : Fetch. Read. Practice.

Fast InvSqrt() - part II
Fast InvSqrt() part II : thanks to the slashdot coverage, the hacker who came up with this wonderful piece of code ( see previous post ) came to the surface, so to speak.
Donald Knuth's answers to various questions
Must watch (if you are a geek and all ).
Hacks
Sometimes you come across some fundamentally cool hacks (superb ideas, if you wish) that really paint a smile on your face, if you are the kind of person who thinks programming and computers are the essence of life, so to speak :-) Slashdot is discussing the Origin of Quake3's Fast InvSqrt(), which is brilliant and somewhat magical way of computing the inverse of a square root ( i.e 1/sqrt(x ). This post provides a step-by-step explanation of what's going on in the code, yet the mystery remains. While I am at it, I find it somewhat revitalizing and refreshing to go back and watch John Carmack's Quakecon 2006 keynote once a while. His thoughts apply to a wide spectrum of computer science topics, from gaming down to web development. Its nice to be able to stream it to TV via Connect 360 and the Xbox 360 and watch it while you are relaxed in bed.
Gaming ( again, and again )

Games for Windows Magazine hits newsstands : The only gaming magazine (offline) I read ( and love) is 'The Edge', but this seems interesting. Hopefully, the news agency I get my goodies from will begin importing it.

The Nintendo Wii launch event came and went smoothly in the US. I can't wait to get my hands on it, and the beloved Zelda when it becomes available in the US, at Dec 8th.

Microsoft does one thing right, really right
Clips: XNA Racer Preview : This was built by a german fellow in no more than 4 weeks, using an alpha version of the XNA tools set, which allows developers to build games for Windows + Xbox 360, using the all too familiar ( and wonderful ) Visual Studio, using .NET components that expose a wide range of facilities to the developers, making it possible to build any kind of games with C# and some free time. I love the whole XNA initiative ; I event downloaded the whole thing and played with it for a while. Great stuff. Microsoft is unbeatable on the game front, with the 360 being just too 'perfect', its live components being dozens of years ahead of the competition and all that. If only they were half as good at the rest of their projects ( Windows, online services.. ) the world would have definitely been a better place.
On top of the desk

I am contemplating buying a desktop PC for satisfying two needs luxuries. Last time I used a desktop PC (co-owned one with my brother) was in early 90's, our beloved Amiga. Once the Amiga was rendered obsolete I was workstation-less at home, until I got my first laptop ( HP ). Ever since then, I 've been using laptops both at home and at work.

However, my i386 based laptop ( which I use for gaming and Visual Studio ) no longer can handle some new games I am really looking forward to playing. In addition to that, I need a system which will allow me to attach storage devices to it, stuffing that storage with files and share them to our various computers at home over our LAN. Being a desktop computer, I will be able to upgrade its video graphics card when I see fit thus being able to catch up with whatever technology will be required to play whatever game with relative ease.

I already have an Audigy NX 2 (usb based) sound card by Creative, a 7.1 speakers setup, my trusty 24" Dell screen. I figure all I need is a compact box, 1G+ RAM, a video card, a motherboard and a hard disk to store our media files.

I could use my MacBook Pro for gaming ( thanks to the BootCamp ) but I just don't have enough free hard disk space to waste on a Windows parition(1) and even if I did have I would have waited for Leopard to arrive with Bootcamp stable (as opposed to current beta, yet fully functional) release(2).

Here is a 'funny' quotation I spotted on MobyGames .

I have a theory that all the money in the world wouldn't make me happy. I'm trying to get billions of dollars so I can test this theory. --Brian Hirt
Linux Kernel Development : Book : mini review

Thanks to the wonderful benefits of weekends ( way more free time than than monday-friday avails us ) available to most of us, I mastered enough time to complete reading the book I mentioned in a previous post. All in all, its a great book. It provides just about everything one needs to start hacking the linux kernel, some unrelated by always useful and/or funny remarks by the author, and, above all, provides you with plenty of ideas you can use in your user-space applications and for motivation to refactor your code and algorithms to achieve the kind of simplicity, beauty and elegance the linux kernel 'gods' developers have achieved. Highly recommended.

I have three more books related to the subject which seem like perfect candidates for investing the remaining time ( until I will have to succumb to sleep, that is ) of the day. As always, do yourself a favor. Turn off the TV, go read a book. No matter what kind of book it is, no matter what kind of show you are watching a TV, it is always worth it.

Standing on the shoulders of giants
List of famous programmers on Wikipedia.
Linux Kernel Development ( book )

I am currently reading Linux Kernel Development, by Robert Love. This book serves as a great introduction ( it actual aims to become the canonical reference for intrepid kernel hackers ) to some fundamental operating system concepts (or, rather, kernel concepts ) and goes into great length describing how the Kernel ( version 2.6 ) is structured and how the various subsystems work.

For instance, the 2.6 process scheduler is a marvel of engineering. Extremely simple and extremely efficient. It reminds me of a similar solution I had to come up with, for one of our services, but mine wasn't near as elegant, really.
I have only read a couple of pages thus far, but everything makes sense. The linux kernel hackers are aiming for simple(=clean), elegant and efficient solutions, the kind of solutions that yield the best possible performance because they are kept simple and are extremely optimized. Its just sad that the quality of the majority of the surrounding applications running in user-space ( desktop managers, open source applications.. ) do not follow this route. If there was a desktop manager ( say, Gnome ) that would have been as well thought of and well build as the underlying OS kernel, Linux would have been a champion that would drive fear to Microsoft and friends.

The Microsoft engineers should do the 'right' ( which is essence is wrong, of course) thing. Study the Linux Kernel source code and rebuild the Vista kernel by following the algorithms and design decisions found in the Linux Kernel. While they are at it, they should make sure to stay away from the concept of microkernels - even if the microkernel servers exist in kernel space ( which invalidates the whole idea behind microkernls anyway ), as it is the case with Windows, its a BAD thing to do. Performance matters. Perhaps, then, Windows would be at least an order of magnitude faster and wouldn't require so much iron ( pumped up hardware requirements ) to run.

iTerm

iTerm, the most promising terminal application for Mac OS X reached 0.9 today. A few months ago it was barely usable, crashing constantly, being slow as molasses and kinda ugly, too. Things have changed, big time, though. The latest iTerm is very stable ( been using the 0.9 pre for a while, hasn't crashed once), very fast ( as fast as Terminal.app or at least very close to that ), looks great, the tabs work perfectly, offers a wide range of options for configuring it to your liking, profiles and other goodies. Grab it while its hot!

Mark Papadakis

Moires, Heraklio, Crete, Greece
Bytes conjurer. Seeking knowledge 24x7
About MarkP

Favorite Quotations

  • Focused, hard work is the real key to success. Keep your eyes on the goal, and just keep taking the next step towards completing it. If you aren't sure which way to do something, do it both ways and see which works best.
  • Focus is a matter of deciding what things you are not going to do.
  • Simple is Beautiful
  • In the information age, the barriers [to entry into programming] just aren't there. The barriers are self imposed. If you want to set off and go develop some grand new thing, you don't need millions of dollars of capitalization. You need enough pizza and Diet Coke to stick in your refrigerator, a cheap PC to work on, and the dedication to go through with it. We slept on floors. We waded across rivers.
  • Fear is the path to the Dark Side. Fear leads to anger, anger leads to hate, hate leads to suffering.
  • Easy is what I know, difficult is what I don't.

    Activity Log

  • 18.03 20:28  @stevedekorte : I know at least one person close to me (@stelabouras) who 'd go to great lengths to be granted beta access to this game :)
  • 14.03 21:12  Unity 3D (http://unity3d.com/) is one of the most promising,exciting and important technologies/products of this gen.Definitely one to watch




Search

Next Page