NoSQL: The Love Child of Google, Amazon and ... Lotus Notes

CouchDB creator Damian Katz wasn't inspired by Google or Amazon or any other web giant. He was inspired by Lotus Notes, an online collaboration platform originally developed in the 1970s and 80s.
Image may contain Human Person Furniture Chair Face Couch Indoors Interior Design Table Arm and Sitting
CouchDB creator Damien Katz.Image: Damien Katz

Most students of the web trace the NoSQL movement back to Google and Amazon.

As they grew their enormously successful online services, Google and Amazon needed new ways of storing massive amounts of data across an ever-growing number of servers, so each created a new software platform that could do so. Google built BigTable. Amazon built Dynamo. And after these internet giants published research papers describing these sweeping data stores, so many other outfits sought to duplicate them.

The result was an army of "NoSQL" databases specifically designed to run across thousands of servers. These new-age software platforms -- including Cassandra, HBase, and Riak -- remade the database landscape, helping to run so many other web giants, including Facebook and Twitter, but also more traditional businesses.

"If you look at every NoSQL solution out there, every one goes back to the Amazon Dynamo paper or the Google BigTable paper," says Jason Hoffman, the chief technology officer at cloud computing outfit Joyent. “What would the world be like if no one at Google or Amazon ever wrote an academic paper?”

Well, the world would still have CouchDB, one of the oldest NoSQL databases. CouchDB creator Damien Katz wasn't inspired by Google or Amazon or any other web giant. He was inspired by Lotus Notes, an online collaboration platform originally developed in the 1970s and '80s.

>'It was a sophisticated system that made it easy to do things that are hard to do with relational databases.'

Damien Katz

Although Notes is best known as an email system, it was more than that. It was a foundation for building applications that depended on databases -- i.e., organized collections of information. Using Notes, businesses built everything from expense-reporting applications to IT help desk tools. Katz was among those who built such applications -- he got his start developing Notes apps for Lotus itself, in 1995 -- and he says that even then, the platform demonstrated many of the same characteristics that have made today's NoSQL databases so successful.

Like its NoSQL successors, Notes went outside the scope of relational databases -- traditional databases that store information in neat rows and columns. "It was a sophisticated system that made it easy to do things that are hard to do with relational databases," Katz says.

In many ways, Katz's story can help explain the NoSQL movement -- and why these databases are so different from what came before. Despite the movement's undoubted success, the notion of a NoSQL database is still so hard to pin down -- "NoSQL means so many different things, depending on who you're talking to," Google distinguished engineer Andrew Fikes recently told us -- and many across the tech industry have yet to grasp the importance of these new database creations.

"NoSQL" is a misnomer. NoSQL databases aren't designed to abandon SQL, the structured query language used pull information from traditional databases such as Oracle and MySQL. A better name would be "non-relational database." NoSQL databases don't use the neat tables of data that underpin relational databases.

These database have two basis characteristics: They can stretch across many servers -- letting you expand your operation as need be, even across different geographical locations -- and they give you the freedom to structure your data how you like. It's this second characteristic that so strongly echoes Lotus Notes.

Platonic Ideal

The Notes platform was inspired by PLATO Notes, an online community that ran on the PLATO mainframe at the University of Illinois. PLATO Notes creator David R. Woolley wrote in 1994 that the project began in 1973 as a simple bug reporting system. Originally, users reported bugs simply by editing a text document, but this led to a few problems.

"There was no security at all. It was impossible to know for sure who had written a note," Woolley wrote. "Most people signed or at least initialed their comments, but there was nothing to enforce this. And occasionally some joker would think it was fun to delete the entire file."

So Woolley -- then just 17 years old -- was assigned to create a more structured system for reporting bugs. The tool he developed let users type their bug report into an application which would save the report into a file, along with the user's name and the date of the submission. The support staff could then display the notes and add responses, which would be added to the same file. Woolley also added two more sections: "System Announcements" and "Public Notes." General Notes was a message board that enabled users to post and respond to messages on any topic.

>A document database is more like a collection of documents. Each entry is a document, and each one can have its own structure. If you want to add a field to an entry, you can do so without affecting any other entry.

Woolley's method of saving messages to a file, instead of a relational database, was a predecessor to the modern "document database."

You can think of a relational database as a big spreadsheet. Data is organized into tables, columns, and rows. If you want to add a field, you add a column, and that column appears in every row for that particular table. This keeps your data structured and uniform, but it's more difficult to manage lots of unstructured data or data that's structured in multiple ways.

A document database is more like a collection of documents. Each entry is a document, and each one can have its own structure. If you want to add a field to an entry, you can do so without affecting any other entry.

Soon, PLATO developers were adding more applications. By 1974, they had an e-mail application, a chat room, online games, and more.

In 1984, Ray Ozzie -- a Lotus developer who had worked on PLATO while attending the University of Illinois -- left Lotus to start a company called Iris Associates. Lotus then funded Iris with the agreement that it would have exclusive rights to the company's flagship product: a system for corporations that was based PLATO.

Today, many see Lotus Notes as a legacy system ready to be relegated to the same dustbins as WordPerfect and Novell Netware. But Notes paved the way for just about every type of corporate communication and collaboration application that came after it, from e-mail clients like Microsoft Outlook to the social network tools like Jive Software to, yes, CouchDB.

Katz and the Couch

Damien Katz joined Lotus as a summer intern in 1995, around the time it was acquired by IBM, and after working with Lotus Notes consulting outfit for a while, he returned to the company, joining the Iris team, which Lotus had formally acquired.

The Database Timeline

1961 Development begins on Integrated Data Store, or IDS, at General Electric. IDS is generally considered the first "proper" database." It was doing NoSQL and Big Data decades before today's NoSQL databases.

1967 IBM develops Information Control System and Data Language/Interface (ICS/DL/I), a hierarchical database for the Apollo program. ICS later became Information Management System (IMS), which was included with IBM's System360 mainframes.

1970 IBM researcher Edgar Codd publishes his paper A Relational Model of Data for Large Shared Data Banks, establishing the mathematics used by relational databases.

1973 David R. Woolley develops PLATO Notes, which would later influence the creation of Lotus Notes.

1974 Development begins at IBM on System R, an implementation of Codd's relational databases and the first use of the structured query language (SQL). This later evolves into the commercial product IBM DB2. Inspired by Codd's research, University of Berkeley students Michael Stonebraker and Eugene Wong begin development on INGRES, which became the basis for PostGreSQL, Sybase, and many other relational databases.

1979 The first publicly available version of Oracle is released.

1984 Ray Ozzie founds Iris Associates to create a PLATO-Notes-inspired groupware system.

1988 Lotus Agenda, powered by a document database, is released.

1989 Lotus Notes is released.

1990 Objectivity, Inc. releases its flagship object database.

1991 The key-value store Berkeley DB is developed

2003 Live Journal open sources the original version of Memcached.

2005 Damien Katz open sources CouchDB.

2006 Google publishes BigTable paper.

2007 Amazon publishes Dynamo paper. 10gen starts coding MongoDB. Powerset open sources its BigTable clone, Hbase. Neo4j released.

2008 Facebook open sources Cassandra.

2009 ReadWriteWeb asks: "Is the relational database doomed?" Redis released. First NoSQL meetup in San Francisco.

2010 Some of the leaders of the Memcached project, along with Zynga, open source Membase.

At Iris, Katz worked his way into the guts of Lotus Notes. Among other things, he rewrote the engine that powers Formula, the scripting language used for developing Notes applications. Katz says he was massively under qualified for the job, but he also sees himself as someone who was born to code. "Each @function I completed was like a hit of a drug and I was a junkie looking for the next fix," he later wrote on his blog.

He left Lotus in 2005, joining a startup called Koobie, but shortly thereafter, he started an effort to bring the Lotus Notes ethos into the modern age, and this eventually morphed into CouchDB. In an early blog post the project, he wrote: "Couch is Lotus Notes built from the ground up for the web."

The original version of CouchDB used a Formula-like programming language. But he soon moved the project in a new direction, turning platform into a dedicated database. "MySQL was at the height of its popularity," Katz says. "And telling people you were working on something that was like Lotus Notes made them go 'ugh!'"

There were bumps along the way. In early 2007, with a new baby on the way, Katz went to work for the MySQL team at Sun Microsystems and quit working on CouchDB. But the open source project had attracted other developers, notably Jan Lehnardt and Noah Slater, who kept plugging away.

Slater introduced JSON, then a new format for structuring data in text files, and while on paternity leave from Sun, Katz ended up replacing the entire CouchDB storage engine, substituting JSON for XML. At that point, Katz realized that using JavaScript -- the standard language for web applications -- might be a better idea than using the Formula-style engine. "Once we introduced JavaScript," he says, "the project took off."

Couch Goes Commercial

In 2007, the revitalized CouchDB attracted IBM's attention, and soon, Katz was back on the company's payroll, developing CouchDB full-time. Crucially, IBM agreed to donate the project to the non-profit Apache Foundation, which meant that IBM also had to grant the use of the company's relevant patents to developers and users of CouchDB. This meant that IBM wouldn't be able to sue CouchDB for infringing on Lotus Notes related patents.

Meanwhile, the NoSQL movement was in full flight. The Google and Amazon papers helped popularize this model -- already advocated by various open source developers -- and provided some insight into how to make it work in the real world.

In 2007, a company called 10gen started work on a NoSQL document database called MongoDB, using BigTable as a model. "It was completely independent, there are not a lot of parallels between MongoDB and Couch and Lotus Notes," says 10gen founder Dwight Merriman. That same year Neo4j, a graph database, was released. A year later, Facebook open sourced Cassandra, a NoSQL database that incorporated concepts from both Dynamo and BigTable. And by 2009, as CouchDB, Cassandra, MongoDB, and others gathered steam, the tech blog ReadWriteWeb asked whether the relational database was doomed.

Meanwhile, Johan Oskarsson, then a Last.fm employee, hosted the first NoSQL meetup, accidentally giving the loosely defined movement a name.

Amidst all the hype, Katz, Lehnardt and J. Chris Anderson founded Couch.io to commercialize CouchDB. By this time, a team of MIT physicists had already started a CouchDB company called Cloudant, and they were hard at work on their own version of the database, called BigCouch, and though Couch.io, later renamed CouchOne, struggled to find it's place in the world, it would soon find its footing by merging with another NoSQL outfit called Membase.

Membase needed a new CTO. CouchOne needed a CEO. Couch needed a better way to scale to large numbers of machines, which Membase could provide. Membase needed a better data structure, which CouchDB offered. And perhaps most importantly, Membase had what Katz saw as a sustainable business model. Both the new company and the new database were called Couchbase.

But the merger led to a messy divorce with Apache. "We made a real effort to keep the changes in sync," Katz says. "But eventually we reached a point in which we needed to move quicker than the Apache project could move." Ultimately, Katz decided to move on from the project he founded and focus his efforts on Couchbase. In January 2012, a year after the merger, he posted a strongly worded farewell letter on his blog, writing: "What's the future of CouchDB? It's Couchbase."

Slater, who had become part of the Apache project's management team, responded with a single tweet: "The future of CouchDB is CouchDB."

Katz acknowledges that he could have been more diplomatic, but ultimately, the story shows just how vibrant the NoSQL has become. Developers are still plugging away on CouchDB, even without Katz' involvement. Cloudant remains committed to CouchDB, having vowed to contribute the BigCouch code back into the project. And Couchbase is on the verge of launching version 2.0 of its database, after landing big name customers such as NTT DoCoMo and AOL. The idea of a document database is now cemented in the minds of developers, thanks not only to CouchDB and its many offshoots, but also the popularity of MongoDB.

Meanwhile, IBM is discontinuing the Lotus brand name. Notes will live on, at least for now. Perhaps its best years are behind it, but it set the stage for so much more.

The post has been updated to correct and clarify the functioning of PLATO Notes