Graph Database and Data Driven Applications

Graph Database and Data Driven Applications

Overview

Modern-day data-driven applications are largely dependent on relevant insights derived from the enormous volumes of data they handle every day. To gain better insights every time, the applications need to be able to send complex queries and the database should be able to address complex queries. Traditional RDBMS systems that rely on SQL are unable to handle extremely complex queries. Graph databases have been able to solve this problem because it relies on objects and relationships between objects. Based on this premise, it is possible to extract deep insights. The use of graph databases, however, is still limited although there are definite signs that it is going to play an important role as businesses rely more and more on insights to power their business.

Also read – Role of NoSQL DB in future data management strategy

What is a graph database?

To understand graph database, let us use the example below:

Bill and his family are planning to go on a vacation to a place that offers great oriental dishes. He has started planning early and one of the ways to find information is of course, Google. While the information from Google is credible and good, for Bill, it is important to get as specific information as possible. So, he starts asking his friends, acquaintances and colleagues. Let us assume that Bill asks Ryan, Sheena and John who are his primary contacts and contact level 1. All three promise to revert with information as soon as possible. Ryan asks his friend Greig who again asks his cousin Martin who had been to Bangkok a few times. Martin recommends the names and details of all eateries in Bangkok known for their oriental dishes. The information is relayed back to Bill.

You have just seen a real-life example of a complex query based on objects and relationships.

The graph database works on the same principle. It is about the network and the objects and their relationships in the network.

Basically, graph database is capable of extremely complex graphs and provide insights which SQL-query based RDBMS systems cannot. And that is the unique selling point about graph databases.

Must Read – SQL on Hadoop – How does it work?

How does graph database work?

The above description of a graph database must have given some idea about the principles that a graph database applies when it goes about searching for information or insights. Basically, it traverses the network of objects and relationships based on the query and return the results.

If we take the above example of Bill, then how would a graph database go about its job? Obviously, there are a lot of relationships and nodes in the example. If we see the distance of the relationships, it would seem like the following:

Bill = 0 (the origin)

Ryan = 1

Sheena = 1

John = 1

Greig = 2

Martin = 3

The distance between the origin (zero) and the node that provides the information could be even farther in real life. That is how the network works.

Imagine an application sending a query based on Bill’s requirement. It would be something like the below:

Find all friends who are connected with five friends who like oriental food but visited Thailand and who live within 5 miles of Dallas, Fort worth.

There are a lot of graph databases available in the market and the Neo4j is the most popular among them. Neo4j can attribute its popularity to the facts that it is both efficient and open source. So, when you send a query to the Neo4j to solve Bill’s problem, the query could look something like the below:

// select friends and friends of friends, keyword of oriental food, keyword of Bangkok, order by depth of the relationship

String findFriendsQuery = “start n=node(*), person=node({userNode}) MATCH p = (person)-[:FRIEND*1..2]-(friend) return distinct p order by length(p)”;

Based on the query, Neo4j is going to search through its available network and find closest matches.

Difference between graph database and relational database

The main point around which relational database and graph database are compared is speed of transaction, that is, how fast can it process a complex query on a big dataset.

Some days ago, Emil Eifrem, the CEO of Neo Technology the company behind Neo4j, measured the performance of both relational and graph databases on multiple parameters. The query was: in 1000 users with each user having 50 friends or more, find out if one user is connected to another in 4 or fewer hops. The results are given below:

A popular open-source relational database took 200 ms to process the query while graph database took 2 ms.
When the same query was run on a user base of 1000000 users, the graph database took 2ms while the relational database had to be aborted after a few days of never-ending processing.

The main reason the relational database was taking such a long time to process queries was that it was searching the data for every term provided in the query. No wonder then that it was taking a long time. On bigger database, it would take even longer. The graph database, on the other hand, would only look at records that are directly connected to the records in the database. If the graph database is allowed a specific number of hops, then it would stick to that exactly. That was the reason graph database was able to process complex queries on huge datasets relatively easily and achieves faster results.

Must Read – Tips And Tricks For MongoDB Developers

Case studies on graph database

There have been many successful applications of the graph database in different industries. The big companies have led the way in building their world-class products with the graph database principles. Initially it was thought that since it was about nodes and relationships, certain industries like the social media could benefit from this. However, other sectors such as online dating, manufacturing and online job portals have also benefited from it. Given below are a few examples:

Facebook has successfully put to use the graph database in building up its world-class product. Today, you are able to search information by traversing across your network of friends and theirs and so on.
LinkedIn has been working on its much-publicized Economic Graph. The Economic Graph plans to provide suitable opportunities to all its users by connecting the users with the companies and their profiles up to a certain level.
The recommendation system, which is a very important tool for many online retailers, has been using the graph database principles to provide effective, relevant recommendations to potential consumers. The recommendation engines basically searches the network of customers who have made similar purchases over a period of time and assumes that the customer who is browsing similar products will have the same tastes and preferences.

Must Read – Exploring HBase NoSQL DB

Summary

For all the potential of graph database, a lot of companies are still playing catch up with the trend. So, it will be a while before graph database is widely accepted. While the potential of graph database in solving complex problems is no longer in doubt, the position of relational database is not threatened in any way. The best thing going for graph database is that it can be offered as an open-source technology. It is up to the industries to leverage the benefits.

Explore more articles on NoSQL DB