Why Do People "Watch" Code? Survey Results Edition

A few weeks ago I dashed off a post called "Why Do People "Watch" Code? A Short Survey For Open Source Developers." I had been spending some time attempting to pin down the significance of GitHub Watchers. Specifically, I was struggling with how much emphasis should be put on the increase (or decrease) in project watchers when attempting to calculate the growth of Riak's open source community.

Instead of guessing, I resolved to ask developers two simple questions:

  1. When do you typically start Watching/Following a repo?
  2. Why do you choose to start Watching/Following a given repo?

At the time of publishing this post, I have received exactly 400 responses[1]. This is not an insignificant number and I take this to be an adequate sample size to draw a few conclusions. So, what did I find?

1. When do you typically start Watching/Following a repo?

Available Responses Total Responses % of Total
When a piece of code seems interesting even though I may not have immediate use for it 311 78
When I clone it and start experimenting with it 39 10
Only when I start using it in production 24 6
Never voluntarily. I only start Watching something when forking a repo makes me a Watcher by default 11 3
Other 15 4

2. Why do you choose to start Watching/Following a given repo?

(Respondents were asked to choose all that applied.)

Available Reponses Total Responses % of Total
I'm interested its progress and development 353 91
I'm using it in production 212 55
I like the author's code 183 47
Not sure. I just watch because I can 28 7
I saw it on Hacker News 47 12
I'm being paid to work on the code so I should probably keep an eye on it 44 11
Other 56 15

The responses to "When do you start watching?" are more or less in line with what I suspected. The overwhelming majority of people (78%) watch a repo when they are "interested." A good portion of this 78% will probably come back to it and test drive the code when they have free time or actual use for it. But maybe they won't. In other words, Have Interest, Will Watch.

6% of people surveyed said they only watch when they are using something in production. That's a small number, but it's still useful. For example, in the case of node.js, we can say that, based on people using GitHub alone, at least 277 people/organizations are running node.js in production somewhere. That number undoubtedly gets significantly larger when you take into account the approximately 463 people (or 10% based on the survey results) who started watching when they cloned the repo to start experimenting. (There will also be more production users who come from the "Interest" and "Fork" constituencies.)

The "Why" responses are a bit more interesting, if only because participants had the ability to select multiple reasons. Again, "interest" takes the top spot, but we also see that people are watching a lot of times because they like the author's code (47% of total respondents). 56 people (or 15%) checked off "Other" as a reason for why they watch a repo. Nearly 20 of these involved using the watcher feature as a "bookmark" (with at least one person even equating it to Delicious).

What do these numbers mean for community growth?

It would appear that for community managers the "watchers" metric should be treated primarily as a sign of increased interest and popularity. Developers largely use it to bookmark projects in which they are interested. And perhaps that's what our social coding overlords at GitHub wanted it to be. (If that's the case, then it's a wonderful feature.) I have no doubt that an increase in watchers means an increase in users, but only 6% of people surveyed hold out until production usage to start watching. The path from interest to cloning to production usage is a complex one,  and knowing how many watchers convert to users (and potentially contribute to a project's community) is something that's extremely hard to pin down. I'll continue to keep an eye on Watchers, but I won't treat it as a primary indicator of community growth.

Here's my advice to you, GitHub: make it easier for developers to declare that they are using code in production. Sure, this is for my own selfish community manager-centric needs, but other people will find it valuable, too. This might be as simple as just adding a "Production User" option to the right of "Watcher" and "Fork" buttons. Or maybe it's more complex than that. However you implement it, I think it would be a fantastic addition...

Thanks to everyone who participated. I found this data immensely valuable. I hope you did, too.

Mark 

[1] Raw Survey Data

Why Do People "Watch" Code? A Short Survey For Open Source Developers

As community manager for various open source projects at Basho, I spend a lot of time thinking about community growth. Questions like "How fast is Riak growing compared to similar projects in the space?" and "What metrics should be weighted more when calculating growth rates?" keep me up at night (no joke).

Lately I've been thinking about how to best use the various stats that are easily gleaned from GitHub repositories (which is the canonical home for all of Basho's code). Among the available repo-specific metrics are the number of "Watchers" and the number of "Forks." I keep an eye on these two stats quite compulsively and record their ups and downs on a weekly basis. (You can see a list of the most watched GitHub repositories here and a list of the most forked repositories here.)

Forks are much easier to use than Watchers when talking about growth; though what it means to "fork" a project has changed drastically over the past several years, when someone forks your code it implies, among other things, that they intend to spend time working with and on it (and perhaps even contribute!). But what about Watchers? What does it mean when one starts watching a repo? What level of commitment does it take to start watching a repo and what does that mean for the growth of said project? Without a doubt Watcher growth is analogous to popularity growth, but what does that mean for community growth (read: actual use and participation in development)?

I poked around a bit (and even talked to a friend at GitHub briefly) but couldn't seem to find any data or explanation around why or what causes people to start watching repos on GH. So, I threw together this simple survey (scroll down) for anyone who is using GitHub and/or BitBucket. (BitBucket also displays Forks and has the concept of "Followers" which I take to be analogous to Watchers.)

I understand there is some overlap between "When" and "Why", but I though it worthwhile to ask them both anyways.

Thanks for your participation. I should take you all of 30 seconds...

Mark

 

(Update: Here's the Hacker News comment thread based on this post.)

 

 

 

 

What's Missing in the NoSQL Space?

The NoSQL space is awash in commentary and publications. Seriously, it's getting crowded. This is a very good thing. It means that there is enough interest and usage of these technologies to necessitate regular reporting. However, it's time to take this coverage a bit deeper.

We need to close the gap between very high-level (and sometimes "sponsored") blog posts, NoSQL Wikis, and immensely-valuable but dense academic papers. Without a doubt they all have their place, but result in a lot of valuable knowledge going unaddressed. Most developers can either rattle off the buzzwords and newest features or can recite the code and design principles verbatim (and those that fall into the latter category are few and far between). What's worse is that a lot of times how a database works is misunderstood until it's too late and something breaks in production, something that could have been prevented with more knowledge of how a given DB is supposed to function.

So What's Missing?

Highly-technical, side-by-side behavior comparisons. Start writing about what is happening under the hood of each of these DBs and how they stack up in relation to their peers. For example, it's easy to say, "CouchDB uses MVCC for versioning whereas Riak uses vector clocks" but what does that mean in practice and how could that affect your query pattern under load?

Here are some more examples:

1. Data Distribution and Replication

Riak, Cassandra and Voldemort are often categorized as "Dynamo Clones." Many people cite simple, built-in replication and masterless data distribution as the distinguishing features for this class of NoSQL DBs. But they each do it differently and build in various control mechanisms. Put the different approaches and design principles side-by-side for all these databases and walk your readers through what happens when data is written, read, or deleted in an X node cluster. What happens when you bring up or take down a new server? Redis, MongoDB and others all have their own approach to data distribution. What does their code say will happen? How should you plan for this?

2. MapReduce and Other Querying

Riak, MongoDB, CouchDB and others push MapReduce as a query method. And, not surprisingly, they all do it differently. Put their MapReduce implementations side-by-side and walk readers through what it is and how it can be used. And be sure to compare it to Hadoop's MapReduce. This differentiation is a biggie.

3. Data Durability

Databases should offer you some guarantee that, once written, your data will persist to something (preferably a disk) and be safe and free of corruption in the event of failures. Write an overview of how each database handles durability and how this might impact your choice. Is each database even built for durable writes out of the box? What are the backend storage options and what are their pros and cons?

4. Data Versioning and Consistency

What mechanisms are there in each database for version control? Riak uses vector clocks. CouchDB uses MVCC. MongoDB "is more of a traditional update-in-place store." Examine the code from each database that handles versioning and give some details on how this should be accounted for in applications.

5. Interfaces, APIs and Data Formats

Cassandra employs Thrift. Riak uses both HTTP and Protocol Buffers for APIs. MongoDB uses a Wire Protocol and BSON. The list goes on and on. Write a post that compares API design. Talk about what different data formats each DB will store and how that will impact your application design. How mature is the client code for a given language? What is the state of the documentation for their respective clients?

6. Operations

How easy is it to back up and restore a node? Does each DB even offer this functionality? Is there support for backing up an entire cluster? Is there a full-fledged and robust suite of command line tools? How easy is it to add storage capacity? Or reduce cluster size to save money when running on Amazon or Rackspace? What are the "gotchas"? And, once again, how should you plan for the state of this functionality in your application?

And this list is just a start. (This took me about 10 minutes of thinking to compile). Also, huge bonus points will be awarded and traffic spikes will occur if you take the time to include MySQL and/or PostgreSQL.

The bottom line is that these databases are maturing quickly and any developer worth their salt should now be in favor of the right tool for the job (and if you're still on the fence about putting something called "Redis" or "CouchDB" into production, you may want to reconsider your approach to selecting tools as you're missing out on some awesome technology that's solving real problems). These tool-hungry developers need a person or group of persons (that are not vendors or paid by vendors) to deliver deep technical commentary on a regular basis to help them make informed decisions about what DBs to test and deploy. You could be this person.

Don't get me wrong, there is absolutely no substitute for real testing. Test, retest, break something, and then test again. But technical writing that delves into behavioral comparisons for DB X, Y and Z according to the code and sanctioned documentation would save developers scores of time when researching and serve to educate the masses.

If you take on this task, your blog/newsletter/website will be inundated with visitors, conference organizers everywhere will ask you to moderate panels, and developers, CTOs, VPs of Engineering (and maybe even analysts) everywhere will bookmark your posts. Most importantly, I will hug you. Actually, the only thing I can guarantee is that I will hug you, but this is undoubtedly a great opportunity to establish yourself as a technical authority on NoSQL DBs. 

Any takers? 

Using Open Source to Promote Sustainable Farming - An Interview with Chris Villalobos of Open AgroClimate

One of the inevitable and immensely positive side effects of being the Community Manager at Basho Technologies has been taking a keen interest in other open source projects championed by members of our community. One such member is Chris Villalobos.

I first had the pleasure of speaking with Chris some months back after he let it leak that he used Riak to build a distributed event registration system for his church (about which I quickly coerced him into writing a blog post). Chris has since changed jobs and is now an open source developer working at the University of Florida.

The University of Florida is one of eight universities that make up the Southeast Climate Consortium (SECC), whose mission "is to use advances in climate sciences...to provide scientifically sound information and decision support tools for agricultural ecosystems, forests and other terrestrial ecosystems, and coastal ecosystems of the Southeastern USA." Chris is now working on the Open AgroClimate Project which is an extension of the SECC. Open AgroClimate is helping farmers and other providers in the Southeast USA, South America, and soon the world, manage their farming resources more effectively given differing climate conditions using very specialized and soon-to-be open source software.

I interviewed Chris about Open AgroClimate and, more specifically, his role and how he is working to open source these valuable climate risk tools that have the potential to help farmers the world over.

My questions are in bold. Chris' responses are in standard text.

What is the Southeast Climate Consortium and how does it relate to Open AgroClimate?

The Southeast Climate Consortium has been around for about four years with the purpose of creating climate risk tools to help farmers local to the Southeast US. They started a website called AgroClimate which is a collection of software tools built to help local producers manage their resources, such as crops, and assess their risks. It's a project that started as an outreach from different agricultural providers in the Southeast wanting to know things like rainfall patterns and how crops were affected by the area's climate.

To go back a bit further, what they did was to take the work of professors and PhD students in the area of agriculture and wrap code around it to make it actually work. There had been much work in the field which aimed to promote better crop growth through the study of historical weather patterns. They are now building out more crop-specific tools that take into account, for example, how weather patterns will affect their upcoming crop. The result of this work is primarily the interactive tools accessible through the the AgroClimate site.

What I'm working on is known as Open AgroClimate, which is an extension of the AgroClimate project with the emphasis on open sourcing these tools to expand their usage and development. The SECC made the decision to open source these tools last year. They want to get bigger than their current scope in the Southeast and they need more contributors to make it grow. They use R-Scripts extensively in the tools and had seen it succeed as an open source project and reasoned, "Why can't we do it?" Not much has been done outside of just saying, "Let's open source it." And, so, I was hired because I am passionate about open source. I've been focusing on that aspect for a few months now.

What exactly are these tools and how will they benefit from being open source?

The tools are different software programs - currently various, continuously-run algorithms in the form of PHP scripts - that use historical data from different weather stations to calculate weather patterns and then determine how a given weather pattern will affect a different crop in a given area. For example, The Drought Tools show you the risk of drought in your area based on various factors; the Climate Risk Tool shows you what your current climate situation is and how you should account for it moving forward; another example is the Strawberry Disease Tool. This will show farmers how large a chance their strawberries have to get a disease based on the area they are in and what pesticides they've already sprayed. It will also make recommendations on what to do next to hopefully ensure high strawberry yield.

At the moment, these tools (of which there are about ten) are focused strictly on the areas covered by the SECC. We want to expand to other areas. For example, a person in Texas wants to use it to help with cotton growth. By virtue of the tools being open source, he can take these algorithms and the different formulas we are using, plug in his weather data, and modify it for cotton.

We want to take it to other countries, too. Individuals in Brazil and Paraguay are currently interested in using the software. In order to to do that, however, we needed a better development platform. So I put in a lot of time to switch us to the WordPress development platform, which makes it relatively easy for an entry level user to manage. And, by making the source open and accessible, the tools will be easier to adapt to their needs. This is what I'm spending the majority of my time on.

My long-term plan is to expand this to as many countries as possible. I see this having a huge impact in third world countries. Most of these countries have the data lying around but they have no way to apply it in a useful manner to help their farmers grow more efficiently. If, however, a local farmer in Ethiopia were able to access a municipal website and see when it was best to plant a given crop, this would make their practices much more sustainable. And obviously, the more open we make the tools, code, and project itself, the easier it is to spread the usage.

It's a large project and I don't know of any other agricultural climate risk type tools that are open source. This should be able to be implemented in any country, and we feel compelled to make it work as advertised, in an open nature, as there aren't many projects doing anything like this with as much potential and scope.

What language are the tools actually written in?

It's primarily in PHP. We are also using R-scripts to so some of the data analysis and graphics on the backend, which, at the moment are specific to what the SECC is doing. But we are going to use our R scripts to guide others and release this code, alongside the PHP code, entirely open source. I'm also hoping to standardize on jQuery and WordPress for platform development as it's easy for an end user to setup and configure.

What hurdles are you running into?

The bureaucratic nature of academia makes it hard to get decisions made in a timely manner on a project like this. The largest hurdle at the moment, for instance, is simply settling on which license to use. From what I understand, the question of which license to use has been in the doldrums for about a year now. The first thing I did when I arrived was to gather the appropriate approvals to expedite the process such that, at the moment, the license issue is being reviewed by the final committee. What this means is that we should have a decision within the coming weeks.

Intellectual Property is a problem, too. We are coming from the university/academic background, so IP issues are delicate. As I mentioned before, the tools are based on the thesis work of various PhD students. It can be difficult to get people to understand that we are taking what was once their thesis and simply wrapping the appropriate code around it to make it more functional.

Getting people used to open source tools for development has also been a hurdle. Something as simple as Git or Mercurial, which someone like me is quite accustomed to, is unfamiliar to a lot of people. I'm spending my time training a lot of people. Remember, the code is partly being written by scientists and students, not career developers. This also lends itself to code which could be a lot cleaner. As a result, I've been working with our developers to refactor our various code bases before it's released.

So, unfortunately, you're still working on actually open sourcing the code. What will the final license be?

I want to use the BSD License. The MIT License does offer the amount of openness we are looking for, but due to the academic nature of the tools and the code, I would really prefer the language used in the BSD's "Non-endorsement" clause. As for the GPL, for our purposes it's too limiting. I want people to be able to roll this into a commercial product if they desire to do so. Just as long as there is proper attribution, we are fine with it.

Where will the project be hosted?

As of right now, we are planning on using GitHub when the license issue has been ironed out. One of the goals is to raise the number of contributors and awareness and GitHub has proven itself quite capable in that respect. We will also maintain our own repositories on our servers as well.

So Open AgroClimate will be providing the code to help farmers manage their crop risk. Where are these farmers getting the weather data?

Right now, the SECC is getting the data from organizations like the NOAA/NWS, and University has connections with local weather extensions such as FAWN and AEMN. Suffice it to say, there are many data sources.

Unfortunately at the moment the code base is tailored to the needs of the SECC. One of the tools I am currently working on will enable what I am calling "Plug and Play" data usage. When complete, you'll be able to point your data source at it, whatever form it may be in (csv, excel, flatfile, json, etc.). And it's modular, so that if there isn't code already available that will connect your type of data, you (or another contributor) can write a connector, enabling you to parse the data and load it into the primary database where we can process and analyze it with the tools.

Flexibility with data sources is a large component of the project, and it becomes more critical as we move forward because we really will need to be able to parse anything. A while back someone said, "Let's just standardize on XML." You're going to tell me that some weather station in Paraguay is going to be storing their weather data in XML? Doubtful. So the ability to be flexible with data types will be essential to our expansion.

Are you actively recruiting data providers?

It's not a primary initiative. That will come as we start rolling out the infrastructure to places like Paraguay. They will be our first test. They have data sources and types that we've not yet encountered.

Besides an internet connection, what does a rural farmer in Paraguay need to know and do to use your tools?

Thankfully, the way it's set up right now, all the data that a farmer would need is coming from weather stations. End users need only specify where they are and they're off and running. This is how it works with the risk planning tools. In the case of the strawberry planning tool, you input what you've already done to the crop, and it shows you what your risk of disease is based on various factors. It will also make recommendations on what pesticides to use or not to use. So, in essence, they only need to input information that they are likely to already know about their land and farming practices.

For some of the tools we actually have a mobile application that works on various smart phone platforms. It's less detailed but it's still quite useful. And we already know of farmers who take their cell phones into the fields with them to use while they are farming.

I should also note that one of the primary tenets is "Don't Develop in Isolation." We encourage developers to work with the farmers (if possible) when developing these tools such that if the farmers can't figure out how to use them, it needs to be changed. That is where our heart is. Not all scientists or developers think like end users.

How many people are contributing at the moment?

At the University of Florida there are about six. We also have some in Brazil, and several throughout the universities that compose the SECC.

Once all the tools are officially open sourced and more accessible, what types of contributors are you looking for?

Let's see...basic tool developers would be one type. The front end interfaces need writing and converting into different crops like cotton and corn. Support for using different data calculation methods would be great, too. We also need database and backend developers. We are using MySQL currently to store the data but would love to have support for other DBs if they were needed and more useful. We also need people to write the code that makes it possible for any data source to be plugged into one of the tools for analysis. That's the "Plug and Play" code I mentioned before.

Graphics designers would also be a huge win for us, especially as we move towards other demographics. At the moment we are working with one template which has served us well but having more options would be optimal.

Finally, translators are needed. We would like people to be able to see these tools in their locale, and because of the internationalization support of WordPress, it is very possible. The tools currently under development and going forward will use .po/.mo files for translation purposes.

Needless to say, there is a lot of room for contribution, so we want to talk to anyone and everyone who is interested.

Speaking of contributors, what is the best way for a developer to get involved right now?

Right now, the best way for people to get involved is to stay informed of the progress via the mailing lists and forums on the Open AgroClimate site. They are admittedly sparse at the moment, but as we come closer to starting the engines for a release, they will be the hub of communication for the project.

What I Learned from Organizing the First Riak Meetup; A Primer on Event Planning

A Community Manager for an open source project has a laundry-list lot of responsibilities ("wears a lot of hats," if you will), and this list only gets longer when you work at a startup. Recently, I had the opportunity to try my hand at event planning. As some of you might have known, Basho just organized and held the first Riak San Francisco Meetup which was originally announced on the Basho Blog. It was very well-attended (drawing more than 100 people), the presentations and talks were informative and valuable, and the caliber of Riak (and non-Riak) discussion was exceptional. All told, I would argue it was a huge success.

I had never planned an event of this size before, so I had to figure out how to do it on the fly. What I learned was that, while it's nowhere near as complex and challenging as writing a distributed database in Erlang, it does take some considerable time, effort, and precision.

If you read my last post entitled Why I Write the Riak Recap Every Day, you may recall that I deal in what I consider to be mainly commonsense and simple but useful insights and advice. This post falls into that category. Hopefully those looking to plan an event can learn from this and perhaps offer some tips and tricks that I might have overlooked.

Goals and Vision:

First things first: What's the point of the event and what are you trying to accomplish? For me, it was initially about starting a monthly Riak developer meetup. The significance quickly grew when we decided to make it also about our arrival in San Francisco and the one-year anniversary of Riak being open-sourced. So, our vision and goals became two-fold:

  1. Establish a monthly Riak developer meetup to educate, promote, and exchange ideas
  2. Show developers, technologists, and executives that Riak is a rock-solid project with a strong community growing around it. People should leave with a firm understanding of what Riak is, who is using it, and why they should want to get involved.

You should have have a well-articulated vision for the event. It doesn't have to be a novel. A few lines will do. Once you know the point of the meetup or event, you can then set about planning appropriately.

Organization

To actually organize the event and track attendance and RSVPs, I chose Meetup. The price for hosting the page for six months was cheap, and that was before the 50% discount I received. (Try this: go through the motions to set up your group's page and then leave the site before actually buying anything. I did this accidentally and then happened to receive an email a day later saying, "Hey, we noticed you almost bought something. Would 50% off make it worth your while?" Not sure if it's an automated email but it might be worth a try.)

Why did I choose it? No real reason except that it was cheap, and, to be honest, the Redis Meetup looked like it was quite successful, so I figured it wasn't a bad model from which to borrow. I spoke with a few people who weren't too psyched about using the site for organizing the meetup, but I think I'll stick with it. The "Review" and "Ideas" features have already proven useful, too, and I'm looking forward to seeing more out of these as we start to hold more meetings.

Location

In San Francisco, there is no shortage of rentable spaces to accommodate a developer event. After some exploration and deliberation (using Yelp to start things off), I ended up choosing the 111 Minna Gallery. In addition to being a highly-regarded, non-traditional venue, it made sense based on our attendance target and vision/goals.

Was it expensive? Yes and no. If you need a turnkey event space (complete with bar, bartenders, security, sound system, coat check, etc.) then yes. But there are cheaper options in the SOMA area if you're looking to do something on a smaller, less expensive scale. (For future Riak meetups I'll probably look into renting out Citizen Space.) That said, if you can afford it and need the space, definitely consider 111 Minna. It exceeds its reputation.

Food and Drink

Enlisting the services of a caterer was crucial for us. There is no way we were going to furnish food for more than 100 people without the help of some professionals. We also wanted to impress our guests, and my cooking was not going to help Basho's reputation.

I jumped on Yelp, made a list of 10 potential vendors based on reviews and location, and started soliciting estimates based on my requirements. Most vendors were quick to respond with prices and sample menus. As expected, there was a huge variance in prices proposed, so you'll have to look around a bit to find someone that suits your budget. In the end, I chose a company called Fork & Spoon. If you can forgive them for their heavy Flash usage and the auto-play music, I would highly recommend them. The food quality and quantity were amazing, and the level of customization was top notch. I worked with Clementine Berk, Director of Corporate Events, who was more than happy to accommodate multiple requests and was excellent overall.

 

Pro Tip: Don't forget the vegans. This should go without saying, but have at least one vegan item available.(We had two.)

As for providing drinks, if the venue permits, you should have a cash bar at the very least. Basho decided to go all out and provide an open bar, which (not surprisingly) was a huge hit. If you can afford it, buy the bar.

Small Details

The "Small Details" category will expand as your get further into planning and more needs present themselves. Here are a few that I encountered:

  • Projector: If you need one, make sure it's present. And show up and test it before the meeting happens. Make sure the actual projection fits on the screen and adjust in advance as necessary.
  • Dongles and connectors: Talk to all your presenters and find out what type of laptop they will be using and have an array of dongles available (even if they are planning to bring them). Another way to do this is to have a dedicated, pre-tested laptop and have all presenters send their slides over in advance.
  • Name Tags: I totally overlooked this detail and would have forgotten had I not been texted by a coworker 35 minutes before the event was to start. To avoid an eleventh hour sprint to Office Depot, grab a bunch of name tags and some Sharpies far in advance.
  • Music: The Riak meetup was a two hour event, with about 60 minutes scheduled for talks and Q and A. For the other 60 minutes, I threw together a playlist that was played at low volume to keep things lively while people were chatting.
  • Wifi: Make sure it's available (if you need it) and do your best to ensure the network name and password are visible.
  • Photos: Definitely worthwhile having a photographer if you can afford it. And it's even more worthwhile if you happen to know a kick-ass photographer who is willing to attend and take photos for free. (Chances are that you know someone who can do it for you. We summoned the powers of the inimitable @kirindave Make friends with him. He's quite talented!)

General Tips, Tricks and Regrets

  • Budget is key. Establish this first and make all decisions based on what you can spend. You can plan all you want, but if you book the Moscone Center only to realize you can't afford it, you've wasted time and effort.
  • Ask for discounts. We are a startup and are on a tight budget. If you're in the same situation (or are at all frugal), it's more than reasonable to ask for a discount. I'm not advocating asking for a 50% reduction in price. That's just being an ass. But, something in the 10% area in exchange for making them the preferred vendor for future events is fair and mutually advantageous.
  • The only thing I regret is not filming the talks. I have to admit that I almost totally overlooked this, and when I realized I should be video taping the proceedings for posterity, it was too late. There was also considerable interest in it being streamed live online. I'll be looking into this for the next installment of the meetup, too.

Conclusion

Based on Basho and Riak’s current trajectory, I don’t suspect it will be too long before we have the opportunity to plan an event of this size and importance again. This is exciting because, though stressful, it was an amazing learning experience and I have no doubt that we can do it better and more efficiently. The next Riak-related meetup I will have a hand in planning will be next month’s Riak Meetup in San Francico (details to be announced soon), and you should join the group if haven’t done so already.

Meetups are essential to the success of any project. Attendees (especially those from the VC and C-level ranks, of whom there were numerous at the Riak Meetup) will note not only the quality of your software, but also the way you organize and execute events. It pays to do it well.

Why I Write the Riak Recap Every Day

Firstly, if you're not familiar with Riak, it's an open source, scalable, kick-ass database that will make your day-to-day life as a developer/ops professional easier while making your users/customers happier. If you're at all interested in database technology and distributed systems, go to the Riak Wiki and check out the Fast Track. It's 45 minutes very well spent.

The Riak Recap is, by all accounts, a very small blip on the overall community radar screen. But, it's something that has worked quite well for Riak and those who follow it. I'll be the first to admit that its usefulness and success within our community was unexpected. I'm not sure why, but developers seem to like it. It's for this reason that I thought it might be worth devoting a blog post to in the event that there are other communities and projects that might benefit. (And, I'm finishing off this post from the Community Leadership Summit in Portland, so I'm feeling particularly pumped about sharing community tips, tricks, and ideas.)

What is the Riak Recap?

The Riak Recap is a daily email, sent to the Riak Mailing List, that details briefly all the worthwhile content and information generated about or pertaining to Riak that was not sent across the mailing list. The concept is very simple and there is definitely nothing groundbreaking about it. This is what the typical Recap looks like.

After becoming Community Manager at Basho, I made it a habit to wake up, pour a cup of coffee, and read the IRC logs. (If you'd like to join me, you can read them at irclogger.com/riak). As with most projects, IRC, along with the mailing list, is where the bulk of regularly scheduled technical discussion is happening, so I thought it necessary to stay current on all that happened in the channel.

Riak, comparatively speaking, is a young open source project (though it has been in production for years), and many developers are still learning about what it is and what it can do. This results in a slight imbalance between questions and answers. Sometimes a question would get asked and receive no response, stranded and destined to float forever in the IRC ether. It killed me when this happened. For a newcomer to a project, not getting a response to a question or idea could mean the difference between just downloading the software and actually using it in production. (And there is no shortage of solutions in the database space, so we cherish every user.)

 

So, the initial purpose of the Recap was to collect these orphaned questions, answer them, and send them off to the mailing list in the hopes that whomever asked would have their questions answered. Aside from questions, there are also incredibly valuable conversations that happen in an IRC room at all hours of the day. The problem is that at any given time only a fraction of your users will be lurking about in IRC. Why should the majority of your community be deprived of the knowledge and content generated when they aren't able to pay attention? This led me to literally start cutting and pasting interesting conversations into Gists and linking them from the Recap with a one or two line preamble telling people why they should read them. This is a great example.

In addition to orphaned questions and logged IRC conversation, I gradually added more content to the Recap and it now consists of anything and everything:

  • Links to new Riak-related repos on BitBucket and GitHub
  • Interesting Tweets
  • Pictures of Riak and Basho t-shirts and stickers in the wild
  • Announcements about upcoming talks and presentations
  • Slidedecks from presentations
  • Link to blog posts
  • Pointers to new wiki additions and documentation

In other words, any information source is fair game. It has even gotten to the point where people will send me things via email asking to have their blog post or new Riak driver included in the next Recap.

Besides keeping everyone interested in Riak up-to-date, there are some other positive side effects of the Recap:

  • Indexing and web searching purposes; Everything that comes across the Riak mailing list gets indexed, Recaps included. We also link all the recaps from the Riak Wiki. This increases the number of Riak resources and searchable content on the web.
  • Shows new (and existing) users that there actually is activity around the project; A lot of times developers will evaluate not just the quality of an open source software project but also the activity level of the community supporting it. The activity is out there. The Recap brings it to them everyday.
  • Recognition; It's well acknowledged that peer recognition and community notoriety are among the top reasons people participate in open source projects. A link to a new Riak driver with a thank you and some praise goes a long way to encourage more participation without being disingenuous.
  • It shows us where the holes in our documentations are; If, for instance, users keep asking about a way to "list keys" in Riak, we know we need to address this better in our documentation.
  • Reporting all the questions shows that there are no stupid questions; Want to scare a newcomer away from a project? Easy - make them feel like their questions are stupid or trivial. Just recently I showed a new comer to the #riak channel that their previous day's questions were to be included in the Recap. Their response: "I thought I was being 'repetitive' with the questions.... it's nice knowing they're relevant."
  • Helps spread the word about Riak to those who aren't on the mailing list; At some point people started tweeting links to the Recap, thus enabling those not on the mailing list to learn about the project and stay current. (Mailing lists aren't for everyone.)

Ultimately (and somewhat sadly), the Riak Recap is probably not...scalable. The amount of activity in and around Riak is growing at a crazy rate and it will soon be hard to condense it all in a daily email. But, while it is still feasible, I believe writing it is worthwhile. I would encourage you to try something similar if you think it would help your project.

Big Table in the House of Ballmer; NoSQL Summer Kicks off tonight in Boston

The Boston Chapter of NoSQL Summer kicks off tonight at Microsoft's New England Research & Development Center. (A Big thanks needs to be given to Matthew Podwysocki and Leah Brunson for helping to lock down the meeting space for the summer.) Basho, with some help from Stefano at Cloudant, took the reins and helped get this meeting off the ground, and after missing week one, we have about 20 onlookers queued up to attend tonight.

NoSQL Summer, the brain child of the génial Tim Anglade, is billed as a "seasonal, worldwide reading club for databases, distributed systems & NOSQL-related scientific papers." What started as a 10 city experiment intended to bring a few curious developers and technologists up to speed with NoSQL Technologies has, in less than two months time, blown up into a global weekly meeting of the minds, currently comprised of almost participants in 31 cities on four continents. Awesome, to say the least!

The inaugural Boston meeting will be lead by Daniel Einspanjer, a developer for Mozilla who has production experience with a whole host of databases. He is currently involved with production installations of (at the very least) Hbase and Riak. Tonight's meeting with cover Google's Big Table Paper, the publication after which Hbase is modeled. Daniel has some serious NoSQL chops, so we are all looking forward to him dropping some theoretical and production knowledge on us. (If anyone in the Boston area wants to join us for next week, register for the local mailing list to get involved.)

Aside from educating the masses, the goal of the NoSQL Summer meetings is to annotate the papers being discussed. So, with any luck, some usable notes and insights will emerge from tonight's meeting. Stay tuned for those and other anecdotes from what will hopefully be the first of many insightful and worthwhile meetings here in Boston this summer. Who knows, perhaps Ballmer himself will make an appearance; he does have all this free time on his hands now that the Kin is no more...