Part 1: Introducing riak_core

Posted: July 30th, 2010 | Author: kevin | Filed under: Erlang, riak_core | View Comments
  • What is riak_core?

    riak_core is a single OTP application which provides all the services necessary to write a modern, well-behaved distributed application. riak_core began as part of Riak. Since the code was generally useful in building all kinds of distributed applications we decided to refactor and separate the core bits into their own codebase to make it easier to use.

    Distributed systems are complex and some of that complexity shows in the amount of features available in riak_core. Rather than dive deeply into code, I’m going to separate the features into broad categories and give an overview of each.

    Note: If you’re the impatient type and want to skip ahead and start reading code, you can check out the source to riak_core via hg or git.

    Node Liveness & Membership

    riak_core_node_watcher is the process responsible for tracking the status of nodes within a riak_core cluster. It uses net_kernel to efficiently monitor many nodes. riak_core_node_watcher also has the capability to take a node out of the cluster programmatically. This is useful in situations where a brief node outage is necessary but you don’t want to stop the server software completely.

    riak_core_node_watcher also provides an API for advertising and locating services around the cluster. This is useful in clusters where nodes provide a specialized service, like a CUDA compute node, which is used by other nodes in the cluster.

    riak_core_node_watch_events cooperates with riak_core_node_watcher to generate events based on node activity, i.e. joining or leaving the cluster, etc. Interested parties can register callback functions which will be called as events occur.

    Partitioning & Distributing Work

    riak_core uses a master/worker configuration on each node to manage the execution of work units. Consistent hashing is used to determine which target node(s) to send the request and the master process on each node farms out the request to the actual workers. riak_core calls worker processes vnodes. The coordinating process is the vnode_master.

    The partitioning and distribution logic inside riak_core also handles hinted handoff when required. Hinted handoff occurs as a result of a node failure or outage. In order to assure availability, most clustered systems will use operational nodes in place of down nodes. When the down node comes back the cluster needs to migrate the data from its temporary home on the substitute nodes to the data’s permanent home on the restored node. This process is called hinted handoff and is managed by components inside riak_core. riak_core also handles migrating partitions to new nodes when they join the cluster such that all work continues to be evenly partitioned to all cluster members.

    riak_core_vnode_master starts all the worker vnodes on a given node and routes requests to
    the vnodes as the cluster runs.

    riak_core_vnode is an OTP behavior wrapping all the boilerplate logic required to implement a vnode. Application-specific vnodes need to implement a handful of callback functions in order to participate in handoff sessions and receive work units from the master.

    Cluster State

    A riak_core cluster stores global state in a ring structure. The state information is transferred between nodes in the cluster in a controlled manner to keep all cluster members in sync. This process is referred to as “gossiping”.

    riak_core_ring is the module used to create and manipulate the ring state data shared by all nodes in the cluster. Ring state data includes items like partition ownership and cluster-specific ring metadata. Riak KV stores bucket metadata in the ring metadata, for example.

    riak_core_ring_manager manages the cluster ring for a node. It is the main entry point for application code accessing the ring, via riak_core_ring_manager:get_my_ring/1, and also keeps a persistent snapshot of the ring in sync with the current ring state.

    riak_core_gossip manages the ring gossip process and insures the ring is generally consistent across the cluster.

  • What’s the plan?

    Over the next several months I’m going to cover the process of building a real application in a series of posts to this blog where each post covers some aspect of system building with riak_core. All of the source to the application will be published under the Apache2 licensed and shared via a public repo on github.

    And what type of application will we build? Since the goal of this series is to illustrate how to build distributed systems using riak_core and also satisfy my own technical curiosity I’ve decided to build a distributed graph database. A graph database should provide enough use cases to really exercise riak_core while at the same time not obscuring the core learning experience in tons of complexity.

    Thanks to Sean Cribbs and Andy Gross for providing helpful review and feedback.


Restarting This Thing

Posted: July 29th, 2010 | Author: kevin | Filed under: Business, Life | View Comments

So…..been away from this thing for a while. Lots has changed in the interim and I finally feel like blogging again.

I’ll tackle the obvious stuff first. For anyone who doesn’t know I’ve shuttered my consulting company, Hypothetical Labs, in favor of working as a senior developer at Basho Technologies. I joined Basho in January and have really enjoyed working on several releases of Riak.

I’ve cleared my speaking and travel schedule so I have more time for coding and writing. A happy result of this decision is I’m planning a series of articles describing how to build your own distributed application on top of riak_core. riak_core is the heart of Riak and implements cluster membership, data replication and distribution, and hinted handoff among other niceties.

The current plan is to start small and build incrementally with the goal of ending with a complete distributed non-key/value data store. I’ll be releasing the code under the Apache2 license on github as the series progresses. I’m aiming to have the first article up by the end of this week.


Erlang Pattern: The Router

Posted: October 19th, 2009 | Author: kevin | Filed under: Erlang | View Comments

There are three components to this pattern. The client, the router, and the target. The client begins the process by calling the router with a message destined for a target. The router uses the message, and possibly other metadata, to locate a desirable target. Then the router hands off the message to the selected target. The target replies directly to the client and completes the process.

I’ve written this code a number of times in several different languages. What makes the Erlang implementation so nice is it’s brevity and simplicity. Using the gen_server behavior I can implement the core logic in just a few lines:

  1. Client calls the router with a message destined for a target:
    gen_server:call(?ROUTER, {invoke, Target, Msg})

  2. The router looks up the target and forwards the message on:

    handle_call({invoke, Target, Msg}, From, State) ->
    %% Target selection code goes here
    gen_server:cast(TargetPidOrName, {From, Msg}),
    {noreply, State};

  3. The target replies directly to the waiting client:

    handle_cast({Originator, Msg}, State) ->
    %% Server logic goes here
    gen_server:reply(Originator, Reply),
    {noreply, State};

The pretty bit is that all this plumbing can be hidden away inside a function. So the original call to the router in the client winds up looking like this: router:invoke_target(Target, Msg). The rest happens behind the scenes and appears as just another gen_server call.

I’m pretty sure there are no bugs in using this approach. I’ve benchmarked a recent implementation of this pattern at over 6000 requests/sec without a hiccup.


CodeWriMo

Posted: October 19th, 2009 | Author: kevin | Filed under: Programming | View Comments

NaNoWriMo is an event run over the Internet to help aspiring authors write a 50,000 word novel in a month. I was *this* (imagine two fingers held closely together) close to being an English major in college so I’ve got a built in soft spot for authors. I’ve wanted to participate in NaNoWriMo for the past few years but the timing has never quite worked out.

It’s almost NaNoWriMo time this year and it got me thinking about having a similar event for developers. For as long as I can remember code has been the way I’ve expressed my creativity and I know that’s true for lots of others. But it’s also the harsh reality for most that the “day job” is rarely your creative outlet. So you’re stuck trying to cram in a few hours a week of creative hacking in amongst life’s other demands.

What if you took a month and added a sense of purpose to your after-hours hacking? What if you set a few goals and had a group of like-minded people to lend moral support when you needed it?

My idea is this: Start a new open source project from scratch and go to initial release in a month. Working code isn’t enough, though. The release would also include docs and examples. The key is to pick an idea that’s reasonably do-able in 3 weeks’ of hacking leaving an additional week for polish and docs.

Is this something worth doing? Ping me on twitter (@kevsmith) or leave a comment and let me know what you think.


Tomorrow Night

Posted: September 9th, 2009 | Author: kevin | Filed under: Erlang, Talks | View Comments

I’ll be speaking tomorrow night at a joint meeting of DC ALT.NET, Novalang, and Arlington Erlang Users’ Group. I’ve got two full hours to fill so I’m planning on covering a lot of ground. I’ll be leading off with a gentle but speedy introduction to Erlang followed by blatant evangelism on why webmachine is the finest framework for building REST resources EVAH.

If you’re in the area, please consider coming out. You can reserve a spot here.

Thanks to Matt, Luc, and Chris, for organizing the meeting and having me up to speak. I’m really looking forward to meeting everyone and spending an evening talking about one of my favorite subjects.