Graph Databases

Graph databases are rising in popularity because they represent an ideal solution for storing data and connecting relationships between data much more effectively than traditional relational databases. The expansion of enterprise applications needing to manage connected data is the primary factor. May 22, 2017 - Graph Databases are currently gaining a lot of interest, as they can give very powerful data modeling tools that provide a closer fit to how your.

-->

Azure Cosmos DB is the globally distributed, multi-model database service from Microsoft for mission-critical applications. It is a multi-model database and supports document, key-value, graph, and column-family data models. The Azure Cosmos DB Gremlin API is used to store and operate with graph data on a fully managed database service designed for any scale.

This article provides an overview of the Azure Cosmos DB Gremlin API and explains how you can use it to store massive graphs with billions of vertices and edges. You can query the graphs with millisecond latency and evolve the graph structure easily. Azure Cosmos DB's Gremlin API is based on the Apache TinkerPop graph database standard, and uses the Gremlin query language.

Azure Cosmos DB's Gremlin API combines the power of graph database algorithms with highly scalable, managed infrastructure to provide a unique, flexible solution to most common data problems associated with lack of flexibility and relational approaches.

Features of Azure Cosmos DB graph database

Azure Cosmos DB is a fully managed graph database that offers global distribution, elastic scaling of storage and throughput, automatic indexing and query, tunable consistency levels, and support for the TinkerPop standard.

The following are the differentiated features that Azure Cosmos DB Gremlin API offers:

Elastically scalable throughput and storage
Graphs in the real world need to scale beyond the capacity of a single server. Azure Cosmos DB supports horizontally scalable graph databases that can have a virtually unlimited size in terms of storage and provisioned throughput. As the graph database scale grows, the data will be automatically distributed using graph partitioning.
Multi-region replication
Azure Cosmos DB can automatically replicate your graph data to any Azure region worldwide. Global replication simplifies the development of applications that require global access to data. In addition to minimizing read and write latency anywhere around the world, Azure Cosmos DB provides automatic regional failover mechanism that can ensure the continuity of your application in the rare case of a service interruption in a region.
Fast queries and traversals with the most widely adopted graph query standard
Store heterogeneous vertices and edges and query them through a familiar Gremlin syntax. Gremlin is an imperative, functional query language that provides a rich interface to implement common graph algorithms.
Azure Cosmos DB enables rich real-time queries and traversals without the need to specify schema hints, secondary indexes, or views. Learn more in Query graphs by using Gremlin.
Fully managed graph database
Azure Cosmos DB eliminates the need to manage database and machine resources. Most existing graph database platforms are bound to the limitations of their infrastructure and often require a high degree of maintenance to ensure its operation.
As a fully managed service, Cosmos DB removes the need to manage virtual machines, update runtime software, manage sharding or replication, or deal with complex in Gremlin):
- People: The graph has three people, Robin, Thomas, and Ben
- Interests: Their interests, in this example, the game of Football
- Devices: The devices that people use
- Operating Systems: The operating systems that the devices run on
We represent the relationships between these entities via the following edge types/labels:
- Knows: For example, 'Thomas knows Robin'
- Interested: To represent the interests of the people in our graph, for example, 'Ben is interested in Football'
- RunsOS: Laptop runs the Windows OS
- Uses: To represent which device a person uses. For example, Robin uses a Motorola phone with serial number 77
Let's run some operations against this graph using the Gremlin Console. You can also perform these operations using Gremlin drivers in the platform of your choice (Java, Node.js, Python, or .NET). Before we look at what's supported in Azure Cosmos DB, let's look at a few examples to get familiar with the syntax.
First let's look at CRUD. The following Gremlin statement inserts the 'Thomas' vertex into the graph:
Next, the following Gremlin statement inserts a 'knows' edge between Thomas and Robin.
The following query returns the 'person' vertices in descending order of their first names:
Where graphs shine is when you need to answer questions like 'What operating systems do friends of Thomas use?'. You can run this Gremlin traversal to get that information from the graph:
Next steps
To learn more about graph support in Azure Cosmos DB, see:
- Get started with the Azure Cosmos DB graph tutorial.
- Learn about how to query graphs in Azure Cosmos DB by using Gremlin.

Graph databases are purpose-built to store and navigate relationships. Relationships are first-class citizens in graph databases, and most of the value of graph databases is derived from these relationships. Graph databases use nodes to store data entities, and edges to store relationships between entities. An edge always has a start node, end node, type, and direction, and an edge can describe parent-child relationships, actions, ownership, and the like. There is no limit to the number and kind of relationships a node can have.

A graph in a graph database can be traversed along specific edge types or across the entire graph. In graph databases, traversing the joins or relationships is very fast because the relationships between nodes are not calculated at query times but are persisted in the database. Graph databases have advantages for use cases such as social networking, recommendation engines, and fraud detection, when you need to create relationships between data and quickly query these relationships.

The following graph shows an example of a social network graph. Given the people (nodes) and their relationships (edges), you can find out who the 'friends of friends' of a particular person are—for example, the friends of Howard's friends.

Fraud detection

Graph databases are capable of sophisticated fraud prevention. With graph databases, you can use relationships to process financial and purchase transactions in near-real time. With fast graph queries, you are able to detect that, for example, a potential purchaser is using the same email address and credit card as included in a known fraud case. Graph databases can also help you easily detect relationship patterns such as multiple people associated with a personal email address, or multiple people sharing the same IP address but residing in different physical addresses.

Recommendation engines

Graph databases are a good choice for recommendation applications. With graph databases, you can store in a graph relationships between information categories such as customer interests, friends, and purchase history. You can use a highly available graph database to make product recommendations to a user based on which products are purchased by others who follow the same sport and have similar purchase history. Or, you can identify people who have a friend in common but don’t yet know each other, and then make a friendship recommendation.

Amazon Neptune

Amazon Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with milliseconds latency. Neptune supports the popular graph models property graph and W3C's Resource Description Framework (RDF), and it also supports their respective query languages, Apache TinkerPop Gremlin and SPARQL, to allow you to build queries that efficiently navigate highly connected datasets.

Neptune is highly available with read replicas, point-in-time recovery, continuous backup to Amazon S3, and replication across Availability Zones. Neptune is secure with support for encryption at rest. Neptune is fully-managed, so you no longer need to worry about database management tasks such as hardware provisioning, software patching, setup, configuration, or backups.

Neo4j

Neo4j is an open-source, nonrelational, native graph database that provides an ACID-compliant transactional backend for your applications. Neo4j is a native graph database because it efficiently implements the property graph model down to the storage level. Neo4j also provides full database characteristics, including ACID transaction compliance, cluster support, and runtime failover. Neo4j supports its own Cypher query language as well as Gremlin.

To get started using Neo4j, see the AWS Marketplace.

Features of Azure Cosmos DB graph database

Next steps

Fraud detection

Recommendation engines

Amazon Neptune

Neo4j