NoSQL Emerged From a Need

Data Storage: The world’s stored digital data is measured in exabytes. An exabyte is equal to one billion gigabytes (GB) of data. According to Internet.com, the amount of stored data added in 2006 was 161 exabytes. Just 4 years later in 2010, the amount of data stored will be almost 1,000 ExaBytes which is an increase of over 500%. In other words, there is a lot of data being stored in the world and its just going to continue growing.

Interconnected Data: Data continues to become more connected. The creation of the web fostered in hyperlinks, blogs have pingbacks and every major social network system has tags that tie things together. Major systems are built to be interconnected.

Complex Data Structure: NoSQL can handle hierarchical nested data structures easily. To accomplish the same thing in SQL, you would need multiple relational tables with all kinds of keys. In addition, there is a relationship between performance and data complexity. Performance can degrade in a traditional RDBMS as we store the massive amounts of data required in social networking applications and the semantic web.

What is NoSQL?

I guess one way to define NoSQL is to consider what it is not. It’s not SQL and it’s not relational. Like the name suggests, it’s not a replacement for an RDBMS but compliments it. NoSQL is designed for distributed data stores for very large scale data needs. Think about Facebook with its 500,000,000 users or Twitter which accumulates Terabits of data every single day.

In a NoSQL database, there is no fixed schema and no joins. An RDBMS “scales up” by getting faster and faster hardware and adding memory. NoSQL, on the other hand, can take advantage of “scaling out”. Scaling out refers to spreading the load over many commodity systems. This is the component of NoSQL that makes it an inexpensive solution for large datasets.

NoSQL Categories

The current NoSQL world fits into 4 basic categories.

Major NoSQL Players

The major players in NoSQL have emerged primarily because of the organizations that have adopted them. Some of the largest NoSQL technologies include:

  • Column Family Stores were created to store and process very large amounts of data distributed over many machines. There are still keys but they point to multiple columns. In the case of BigTable (Google’s Column Family NoSQL model), rows are identified by a row key with the data sorted and stored by this key. The columns are arranged by column family.

Querying NoSQL

The question of how to query a NoSQL database is what most developers are interested in. After all, data stored in a huge database doesn’t do anyone any good if you can’t retrieve and show it to end users or web services. NoSQL databases do not provide a high-level declarative query language like SQL. Instead, querying these databases is data-model specific.

Many of the NoSQL platforms allow for RESTful interfaces to the data. Other offer query APIs. There are a couple of query tools that have been developed that attempt to query multiple NoSQL databases. These tools typically work across a single NoSQL category. One example is SPARQL. SPARQL is a declarative query specification designed for graph databases. Here is an example of an SPARQL query that retrieves the URL of a particular blogger (courtesy of IBM):

PREFIX foaf: SELECT ?urlFROM WHERE {?contributor foaf:name “Jon Foobar” .?contributor foaf:weblog ?url .}

Future of NoSQL

Organizations that have massive data storage needs are looking seriously at NoSQL. Apparently, the concept isn’t getting as much traction in smaller organizations. In a survey conducted by Information Week, 44% of business IT professionals haven’t heard of NoSQL. Further, only 1% of the respondents reported that NoSQL is a part of their strategic direction. Clearly, NoSQL has its place in our connected world but will need to continue to evolve to get the mass appeal that many think it could have.

Get the Latest Tech News Delivered Every Day