DRAFT: Why proponents of marriage equality should love graph databases, Part 2: A Reply to 'The Database Engineering Perspective'
Written on 11:37:00 AM by S. Potter
So in part 1 I provided an overview of problems relational databases (RDBMS) have with modelling relationships as well as entities that have datasets that are only partially structured in structure and definition based on my experiences.
This post will:
- introduce new concepts utilized in graph databases and specific terminology for Neo databases
- review the most flexible schemas that accommodate both same-sex and opposite-sex marriages alike that were presented in the blog post I am responding to, 'Gay Marriage: The Database Engineering Perspective'.
- describe how graph databases could overcome the relational database shortcomings
- finally I will present a snippet of code (not meant for production use, but merely to demonstrate how we can use Neo4J) that makes it more natural to represent richer relationships between entities (or nodes as Neo4J calls them) that have substance to them (i.e. attributes in this case)
New Concepts in Graph DatabasesBefore I can explain how graph databases on a conceptual level can overcome most if not all (for most of the time anyway) of the relational databases (RDBMS) shortcomings mentioned in the previous post, we need to be introduced to new concepts and terminology.
There are two basic classes of objects in a graph database, they are:
- Nodes: a node is basically an entity as RDBMS people (like probably you and I) are familiar with. A node could represent a person, customer, blog post, photograph, video or tweet. It isn't always true, nor is it a good idea to think of graph database concepts only in terms of RDBMS concepts, but we could consider most tables that do not represent an actual relationship or association between other tables as an entity. It is a simplification, so use only in this initial learning phase cautiously. A node can have zero or more properties.
- Relatioships: a relationship is the concept of an association between two nodes that needs to be represented in your data store somehow. A relationship can also have zero or more properties, just like a node.
The third concept in graph databases is that of a property. It unifies nodes and relationships as both first class citizens of the data store, which is quite unlike relational databases.
In a relational database we try to either reduce an association between two entities/tables to a small handful of cases, which might use references (if supported by that particular vendor and version of RDBMS), create "join tables" or we might force a natural relationship to become a table itself so that we are able to capture pertinent attributes for it. There is no other way of handling relationships. In many cases it might not be an issue to reduce the problem in these ways, however, the more and more interconnected our entities become with new types of relationships it seems (to me) that this method of modelling data quickly becomes problematic.
The following are other useful terms that the Neo graph database family uses, but conceptually most of these will also have a synonym in other graph databases:
- Traversers: in graph databases "querying" for specific data is not done via declarative query statements and clauses like SQL. Rather we define traversers that are composed of the following elements:
- the starting node
- the relationship types needing to be traversed
- the stop criteria
- the selection criteria
- traversing order (e.g. breadth first, etc.)
- Indexers: In Neo4J there are several indexing utilities backed by a Lucene backend that make it easy to index actual nodes, full text and based on timelines. This allows us to look up relevant starting nodes for traversing.
How can graph databases overcome relational database shortcomings?Now we will have a quick peak at how the new concepts and ideas from graph databases can resolve the complaints I had about relational databases:
- In response to A lot of real world data isn't highly structured (only partially): with graph databases we only need to set the properties a specific node actually has. We do not need to fill in lots of
nullsin attributes that aren't relevant like we do in RDBMSes. Our graph database "model" interface might be able to add any constraints that are necessary. While, in situations where highly structured data sets really do occur, I prefer having two sets of constraints - those on the relational database level and those in the "model" (application tier) - however, it can quickly become hard to manage when two sets of constraints that should be identical are defined in vastly different ways (languages).
- In response to Object to relational mapping (ORM) constraints and disjoints: using nodes and relationships conceptually on the graph database layer you have little if any translation between 'objects' that represent nodes or relationships.
- In response to Weak and inefficient "traversal" support: this isn't a problem with graph databases. Traversing data is much more relevant when dealing with data that is naturally in a network or graph formation already. Relational databases cannot handle networks or graph-like data sets without a lot of workarounds.
- In response to Maintaining and evolving relational schemas: when the data sets you are dealing with are not highly structured and densely populated evolving strict data schemas do not need to be maintained and with Neo4J's flexible attribute setting/getting, data structures can be flexibly evolved when the data itself changes, however, this raises another issue of application code being able to read the old attributes in the graph database, but I will talk about that in Part 3.