Skip to main content

An Enterprise Architect's guide to database technologies

Understanding SQL, NoSQL, and graph databases makes you a more capable Enterprise Architect.
Image
Classic card catalog

Databases are an essential part of enterprise architecture. You really can't make a commercial application without one. But choosing just any database is not wise. There are a lot of databases out there. Knowing which one is best for your design is part of the expertise that goes with being an Enterprise Architect.

In order to help matters along, we've created a list of a few of the most popular databases being used today. The list serves as an overview resource you can use in your architectural design efforts. The descriptions provided will give you a sense of what each database is about and the use cases each satisfies. The list also serves as a guide for further investigation. Each entry provides a link to the website for the given database. Once you get a high-level sense of what the database is about, you can follow the links listed alphabetically below to get more details about those that are interesting.

Relational Databases NoSQL Databases Graph Databases
CockroachDB Cassandra AgangoDB
Firebird Couchbase Neo4J
IBM Db2 Elasticsearch OrientedDB
MariaDB InfluxDB
Microsoft SQL Server MongoDB
MS Access Redis
Oracle Riak
Postgres
SAP HANA
SQLite

Relational databases

A relational database is one in which information is segmented over a variety of tables in such a way as to ensure that there is no data redundancy in any table or among tables. Tables are related to each other by way of joins to create specific aggregations of information. (See Figure 1.)

Image
SQL joins connecting three tables
Figure 1: Contact information is an aggregation created by a join between the User and Company tables

The following is a list of some of the more popular relational databases in use today.

CockroachDB

  • Maintainer: Cockroach Labs
  • License: CCL and BSL
  • Year First Published: 2015
  • Last Revision Date: 4/5/2021
  • Product Website: www.cockroachlabs.com
  • DB-Engines Overall Ranking: 59

CockroachDB is a relatively recent arrival among database technologies. CockroachDB makes data highly available and highly consistent on a global scale. Under CockroachDB, data is stored among a cluster of machines. Data is partitioned regionally, thus allowing users to get the data closest to the geolocation from which a query is made. CockroachDB has extended the SQL query language to enable querying according to a given region. Also, under CockroachDB, region assignment can be made at the row level of granularity.

CockroachDB makes it possible to create clusters of machines from a variety of cloud providers. Also, Cockroach Labs provides CockroachDB as a cloud service.

Firebird

  • Maintainer: Firebird Project
  • License: IPL and IDPL
  • Year First Published: 2000
  • Last Revision Date: 10/20/2020
  • Product Website: firebirdsql.org
  • DB-Engines Overall Ranking: 31

Firebird is a fork from Borland's Interbase database. Micro Focus International absorbed Borland in 2009, yet the database lives on as Firebird. Firebase works with X86 macOS, Windows, and Linux/Unix operating systems.

Firebird licensing makes the product free to use. You can modify the source code. However, the licensing stipulates that Firebird or any code derived from the product must remain free to use.

Setting up Firebird to run on multiple servers is a difficult task. As the documentation states, "Setting this up is not a beginner's task though. If you need to run multiple servers on the same machine, the second and subsequent servers must be installed and configured manually."

Firebird supports stored procedures, transactions, triggers, and event emissions. Also, it supports multiple machine replication clusters. It does not support horizontal partitioning by table. Firebird ranks well below Oracle (#1), MySQL (#2), and Microsoft SQL Server (#3) in the db-engines.com comparison chart, coming in at #31.

IBM Db2

  • Maintainer: IBM
  • License: Proprietary
  • Year First Published: 1983
  • Last Revision Date: 6/27/2019
  • Product Website: www.ibm.com/products/db2-database
  • DB-Engines Overall Ranking: 6

Db2 is one of the oldest relational databases around. It comes in mainframe and x86 versions. Also, the database is offered as a cloud service under the IBM® Db2® on Cloud product.

Db2 evolved out of Edgar F. Codd's paper that described the concepts behind relational databases. Codd published the RDMS paper while an employee of IBM.

Db2 is a database that's intended to be used by big companies in a big way. Customers include Costco, Morgan, GEICO, Edward Jones, and the American Red Cross, to name a few. Db2 is used by over 36,000 companies worldwide.

MariaDB

  • Maintainer: MariaDB Corporation
  • License: GPL and LGPL
  • Year First Published: 2010
  • Last Revision Date: 2/22/2021
  • Product Website: mariadb.org
  • DB-Engines Overall Ranking: 12

MariaDB is a fork of MySQL, the preeminent relational database available under the GPL license. The originators of MySQL created MariaDB in response to MySQL being acquired by Oracle. MySQL AB, the company that created MySQL, was bought by Sun Microsystems for $1 billion in 2008. Oracle then went on to buy Sun Microsystems in 2009.

MariaDB is a fully compatible drop-in binary replacement to MySQL. Michael "Monty" Widenius, one of the lead developers of MySQL and then MariaDB, touts MariaDB as being notably faster than MySQL. Others report scaling is easier due to its ColumnStore technology. Also, MariaDB supports the RocksDB database. RockDB is a technology that is optimized for flash storage, thus improving overall performance.

Microsoft SQL Server

  • Maintainer: Microsoft
  • License: Proprietary
  • Year First Published: 1989
  • Last Revision Date: 11/4/2019
  • Product Website: microsoft.com/sqlserver
  • DB-Engines Overall Ranking: 3

Started as a port of Sybase SQL Server, Microsoft SQL Server has become the company's leading database intended for enterprise customers. Microsoft SQL Server runs on Windows and Linux systems. The company uses the SQL Server engine to power its abstract software service running on its Azure platform under the name Azure SQL Database.

Microsoft released the free SQL Server Express version in 2005. SQL Server Express, run on Windows only, is limited to using one processor, 1 GB of RAM, and 10 GB of disk storage.

Microsoft SQL Server ships with several additional services, for example, the data mining, business intelligence tool, Microsoft Analysis Services, as well as SQL Server Integration Services, which is used to import and integrate data from external sources.

MS Access

  • Maintainer: Microsoft
  • License: Proprietary
  • Year First Published: 1992
  • Last Revision Date: 9/24/2018
  • Product Website: office.microsoft.com/access
  • DB-Engines Overall Ranking: 10

MS Access was the company's first database intended for the Windows PC market. Its competitors at the time were Paradox, dBase, FoxPro, and, to some extent, FileMaker.

MS Access's heyday was when the Visual Basic programming language enjoyed a large segment of developer activity. Later, MS Access was added to the MS Office Suite of applications making it popular with desktop users who wanted to move beyond Excel. Needless to say, given the immense popularity of MS Access in the past, there is still a base of legacy installations. And, given its top ten ranking in DB-Engines, it's a popular database today.

MS Access supports transactions and stored procedures. MS Access ranks #10 overall on db-engines.com, placing it ahead of HBase (#23) and Firebird (#31), which are databases intended for enterprise use.

MySQL

  • Maintainer: Oracle Corporation
  • License: GPL and Proprietary when embedding binaries in commercial applications
  • Year First Published: 1995
  • Last Revision Date: 1/18/2021
  • Product Website: mysql.com
  • DB-Engines Overall Ranking: 2

MySQL is an open-source relational database that runs on a wide variety of machines and operating systems. There are versions for Windows, Linux, and macOS. Also, there is a version that runs on IBM/i, an operating system designed for IBM's PowerPC chip, and a version that runs on Digital Equipment Corporation's VAX minicomputers. The product has been around for a while and enjoys a wide user base.

Created by a group of developers at the Swedish company MySQL AB, the database was acquired by Sun Microsystems in 2008. Oracle bought Sun in 2010, making MySQL an Oracle Corporation property.

MySQL is the default database in many well-known applications, such as Drupal, Joomla, and WordPress.

MySQL provides all the features expected from an enterprise database, including but not limited to stored procedures, transactions, triggers, and event scheduling. MySQL Cluster GCE for distributed databases supports data sharding and replication over several machines. MySQL Cluster GCE is also free.

Oracle

  • Maintainer: Oracle Corporation
  • License: Proprietary
  • Year First Published: 1979
  • Last Revision Date: 2/13/2019
  • Product Website: oracle.com/database
  • DB-Engines Overall Ranking: 1

Founded in 1979 as Relational Software Inc. after co-founder Larry Ellison read Edward F. Codd's seminal paper about relational databases, the company, renamed Oracle Corporation, has become one of the mainstays of the IT industry. The company has since acquired technical stalwarts such as the Java programming language, competing database technology, MySQL, and enterprise resource planning software, PeopleSoft.

The Oracle database product is intended for large enterprises and has a price tag to prove it. The entry-level standard unit starts out at ~$17K per unit. It's used by large companies such as Netflix, Intuit, eBay, and Wells Fargo. The company employs over 100,000 people, making it an indelible presence on the database landscape.

The company is making a presence for itself on the cloud, now offering database products on a software-as-a-service basis.

Postgres

  • Maintainer: PostgreSQL Global Development Group
  • License: Postgres License
  • Year First Published: 1996
  • Last Revision Date: 2/11/2021
  • Product Website: postgresql.org
  • DB-Engines Overall Ranking: 4

Postgres has its genesis in the Ingres project out of UC Berkeley. Ingres is an early relational database that was adopted for use on Digital Equipment Corporation computers in the 1980s. For a while, Ingres was a significant competitor to Oracle. Microsoft SQL Server also has its roots in Ingres. Postgres was released as the next generation of Ingress, hence the naming Post Ingres. Postgres first appeared on a public website in 1996.

Postgres supports transactions, triggers, stored procedures, full-text search, and asynchronous event messaging. Stored procedures can be written in standard SQL, PL/pgSQL, and C by way of a shared library. Also, shared libraries allow developers to write stored procedures in Java, JavaScript, R, and Ruby, to name a few other languages.

Postgres has no independent presence in the cloud. But several major cloud providers such as Azure, Google Cloud, IBM Cloud, and AWS offer implementations. There are also smaller companies such as ElephantSQL that offer Postgres as a service.

Postgres enjoys the support of a wide range of prestigious sponsors, for example, Google, Fujitsu, IBM, Amazon Web Services, Rackspace, and VMware. Also, it's used by many companies operating at an enterprise scale. Some examples of companies that use Postgres are Uber, Netflix, Instagram, Accenture, Twitch, GitLab, and Reddit.

Postgres is open-source and intends to stay that way in perpetuity.

SAP HANA

  • Maintainer: SAP AG
  • License: Proprietary
  • Year First Published: 2010
  • Last Revision Date: 8/8/2019
  • Product Website: sap.com/products/hana.html
  • DB-Engines Overall Ranking: 18

HANA is an acronym for High Performance Analytic Appliance. SAP HANA is intended to provide all the data management capabilities any enterprise might require. According to Brian Knapp, Offering Manager at IBM, "HANA is the core of your digital enterprise." In addition to the underlying database engine, SAP HANA provides the capability to accommodate a variety of data use cases, for example, general business data, HTTP event streams, machine learning data, and Internet of Things information.

One of HANA's key features is that it stores data in-memory. In-memory storage makes response times lightning fast, which is critical for working with a database effectively. Also, to keep in step with the times, SAP HANA is available as a cloud service.

ERPs, by nature, are intended for very big businesses. So it's not surprising to see SAP HANA used by enterprises such as AirBus, City of Boston, GM, and VMWare, to name a few of the very big companies. SAP HANA does not make its pricing structure public. Rather, the prospective buyer needs to arrange a demonstration with a sales representative. However, as with any ERP, the cost of an SAP HANA installation can be quite expensive. Fortunately, the company does offer a free Express edition which can be used for small-scale deployment and educational purposes.

SQLite

  • Maintainer: D. Richard Hipp
  • License: Public Domain
  • Year First Published: 2000
  • Last Revision Date: 4/2/2021
  • Product Website: sqlite.org
  • DB-Engines Overall Ranking: 9

SQLite intends to be a database embedded into a program that runs on a local computer and other types of computing devices such as a cell phone and embedded technologies. While other databases run as a separate server, SQLite is serverless. Most of the major web browsers use SQLite to store data. The database also ships with Android, iOS, and Windows 10.

SQLite supports the full breadth of the SQL programming language. It also supports transactions. However, it lacks the distinct DATETIME and BOOLEAN types that are commonplace in other database technologies.

NoSQL databases

NoSQL databases, as the name implies, do not spread data across tables so that there is no duplicated data within or among a table. NoSQL database technologies typically take a document-centric approach to data storage.

Documents, usually in JSON format, are stored within an organizational unit called a collection. While it is conventional to organize information in a document according to a standard structure, each document in the collection can be arranged in any way. For example, one document can contain first_name and last_name information, while another document can contain first_name, last_name, and email information. A third document can contain product_name and quantity information. (See Figure 2.)

Image
docs stored in nosql
Figure 2: An example of documents stored in a NoSQL database

The benefit of NoSQL databases is they provide a great deal of versatility. The drawback is that they tend to accumulate a good deal of redundant information. As a result, data integrity can be compromised, and correlating data between collections can be difficult.

The following is a list of some of the more popular NoSQL databases.

Cassandra

  • Maintainer: Apache Software Foundation
  • License: Apache License 2.0
  • Year First Published: 2008
  • Last Revision Date: 2/1/2021
  • Product Website: cassandra.apache.org
  • DB-Engines Overall Ranking: 11

Originally created as Facebook in 2008 and then transferred to the Apache Software Foundation in 2010, Cassandra is a distributed, wide table, NoSQL database. Cassandra is a distributed database that can run under a cluster of computers that span multiple data centers.

While not exactly a document-centric database, it's not a pure relational database either. Rather, Cassandra is a column-oriented database.

Cassandra does very fast data writes. Fast writes along with its column-oriented architecture enable Cassandra to retrieve data quickly, making it useful for supporting time-series data and working with Internet of Things (IoT) devices that emit large amounts of data in very short intervals. Cassandra is well suited for performing data analytics tasks.

Cassandra supports masterless replication. Cassandra does not support transactions across partitions, nor does it support distributed joins, foreign key, or referential integrity.

Interaction with Cassandra is conducted using Cassandra Query Language (CQL). CQL is similar to SQL in terms of syntax but has added keywords, for example, KEYSPACE and WITH REPLICATION.

Many enterprises, large and small, use Cassandra. GoDaddy, CERN, Hulu, eBay, and Constant Contact, are just a few of the many businesses that have adopted the technology.

Couchbase

  • Maintainer: Couchbase Inc.
  • License: Apache License 2.0
  • Year First Published: 2012
  • Last Revision Date: 12/1/2020
  • Product Website: couchbase.com
  • DB-Engines Overall Ranking: 39

Couchbase is an open-source, NoSQL database that can run over multiple machines running within many data centers. Couchbase supports document (JSON) or key-value storage. It also takes advantage of in-memory caching to deliver data at microsecond intervals of performance.

Couchbase supports a special query language named N1QL (pronounced nickel). N1SQL is similar in syntax to SQL but has differences that are needed to support its document-centric approach to data storage.

Couchbase supports sharding data using a mechanism called vBuckets. Data is spread evenly across vBuckets according to the cluster. vBuckets exist both within in-memory storage and physical disc storage.

Couchbase is used by companies such as Cisco, eBay, LinkedIn, Equifax, and PayPal, to name a few.

Elasticsearch

Elasticsearch is a secure database/search engine technology built on Java and ships under a proprietary source-available license. Elasticsearch is based on the Lucene search technology. It can be installed in a cluster of computers to ensure high availability.

Elasticsearch can be used to store data in a searchable manner, and it can be used as a search engine that works in conjunction with other data storage technologies. Typically the Elasticsearch installation will be used as an index when used with another data storage technology.

Elasticsearch stores data as JSON documents, thus providing a high degree of flexibility in data management. You can use Elasticsearch to store and search a wide variety of data structures, such as logs and event messages.

ElasticSearch has become commonplace in the data analytics space. Elasticsearch is usually installed as part of the ELK Stack (Elasticsearch, Logstash, Kibana) to provide all the tools needed to gather, store and search log data emitted from enterprise applications operating at web scale.

Elasticsearch is very popular and is used by a number of companies operating at web scale. Some of these companies are Uber, GitHub, Shopify, Udemy, Slack, Instacart, Robinhood, Accenture, and Stack Overflow, to name a few.

InfluxDB

  • Maintainer: InfluxData
  • License: MIT
  • Year First Published: 2013
  • Last Revision Date: 2/4/2021
  • Product Website: influxdata.com
  • DB-Engines Overall Ranking: 28

InfluxDB is intended to be a time-series database. Time-series data is information that is reported over time, for example, the price of a stock as reported every 10 seconds during the opening and closing of the New York Stock Exchange. Thus, a time-series database such as InfluxDB needs to accept and store a large number of discrete information packets accurately and in sequence. InfluxDB's free, open-source version can do over a million writes per second.

InfluxDB is open source and is written in the Go programming language. It has no external dependencies. InfluxDB ships with Flux, a scripting language that can be used to query time-series data stored in a variety of databases. Also, InfluxDB ships with an administrative dashboard that allows developers and admins to perform activities such as user management, data monitoring, data exploration, database health checks, and job scheduling. Also, the InfluxDB dashboard provides the capability to create custom dashboard components that can monitor and manage various activities, both those relevant to InfluxDB and external tasks such as commits to GitHub.

To ensure fail-safety and support continuous availability, InfluxDB can be configured to run on a cluster of machines. An InfluxDB cluster is made up of two types of nodes, meta nodes, and data nodes. Meta nodes contain the information about all the nodes in a cluster, the retention policies, sharing information, users and their permissions, as well as continuous queries. Data nodes store the actual time-series data. As a rule, InfluxDB should have three meta nodes and an even number of data nodes.

InfluxDB is used by companies such as Trivago, Hulu, and Texas Instruments, to name a few.

MongoDB

  • Maintainer: MongoDB Inc.
  • License: SSPL
  • Year First Published: 2009
  • Last Revision Date: 7/2020
  • Product Website: mongodb.com
  • DB-Engines Overall Ranking: 5

MongoDB is one of the first open-source projects to establish a major presence in the document database space. MongoDB is written in C++. There are over 4000 forks of the project on GitHub. The project is very popular, with ratings of over 19,000 stars.

MongoDB supports ACID transactions, which is somewhat unusual given that, by nature, document-centric databases support free-form JSON and have no inherent concept of foreign keys. But multi-document transactions have been in force since the Version 4.0 release in August 2018. Distributed transactions have been available since Version 4.2 was released in October 2019.

MongoDB supports clusters and sharding. Also, MongoDB supports distributing replication sets among many data centers.

MongoDB is used by many companies operating at web scale, such as CraigsList, Lyft, Trello, MetLife, Fidelity Investments, and Intuit.

Redis

  • Maintainer: Redis Labs
  • License: 3-Clause BSD
  • Year First Published: 2009
  • Last Revision Date: 3/2/2021
  • Product Website: Redis.io
  • DB-Engines Overall Ranking: 7

Redis is both a message broker and a database. The database is a key-value store. Redis supports in-memory caching as the first layer of data access. Cached data is persisted to disk storage.

Redis is a single-threaded technology in terms of receiving the connections from clients, processing a connection's read/write events, parsing requests, processing commands, processing timer events, and synchronizing data. Thus, scaling can be tricky. Still, it's used by a number of big companies such as Uber, Airbnb, Twitter, Instagram, and Shopify. The message broker does not guarantee an ordered time sequence.

Riak

  • Maintainer: Basho Technologies
  • License: Apache License 2.0
  • Year First Published: 2009
  • Last Revision Date: 9/19/2020
  • Product Website: riak.com
  • DB-Engines Overall Ranking: 66

Riak is an open-source, key-value data storage technology, written in the Erlang programming language and originally created at Basho Technologies. Riak occupies the same space as Redis, Cassandra, and MongoDB. Riak is presently released in three versions. Each version is targeted to a specific use case: riakkv is for key-value storage, riakts is for time-series data, and riakcs is targeted at S3 compatible distributed storage.

Riak is discussed as an open-source implementation of Dynamo data storage technology according to the 2007 paper published about the pattern to data storage.

Riak is designed to be especially fault-tolerant at a massive scale. Thus, multiple node cluster installations are intrinsic to the technology. Riak implements leaderless, peer-coordinated replication that relies upon quorum reads and writes, which means that a majority of the nodes in an odd-numbered node cluster must agree to the validity and consistency of data before writes to disc or data publication takes place.

Riak supports full-text search as well as Map/Reduce operations.

Riak is used by companies such as AT&T, Comcast, GitHub, Best Buy, UK National Health Services (NHS), The Weather Channel, and Riot Games, to name a few public examples.

Graph databases

A graph database is one that not only stores information about entities but also the relationship(s) between entities. The structure that describes entities and their relationships is called a graph, hence the name graph database. (See Figure 3.)

Image
graph database structure
Figure 3: A graph database describes entities as well as the relationship(s) between entities

Graph databases are growing in popularity, particularly among social media sites that need information not only about entities using the site but also the relationship between entities. For example, a user on Facebook can be a friend to another user and a member of a group.

The following is a list of some of the more popular graph databases.

ArangoDB

  • Maintainer: ArangoDB Inc
  • License: Apache 2.0
  • Year First Published: 2012
  • Last Revision Date: 3/2021
  • Product Website: www.arangodb.com
  • DB-Engines Overall Ranking: 69

ArangoDB is an open-source multi-model database written in C++. It can be used as a graph database, a document-centric database, or a key-value store. AgangoDB supports clustering and ships with an administrative web UI that's embedded in the database installation.

ArangoDB has its own query language named AQL. AQL has a programming-like syntax. For example, the AQL statements ...

LET oneDocument = DOCUMENT("myusers/3456789")

RETURN oneDocument

… will return a JSON document according to key, 3456789 from the collection named, myusers.

ArangoDB graph features in-pattern matching and shortest paths according to node distribution and attribute binding in the graph. Also, ArangoDB supports community detection. You can think of a community as a group of nodes in a graph associated with a common interest or organization, such as a hobby or special interest group. In addition, ArangoDB has graph visualization tools for displaying data graphically. It also supports full-text search.

ArangoDB is used by companies such as Airbus, Barclays, Cisco, SAP, and Thomson Reuters, to name a few.

Neo4J

  • Maintainer: Neo4J
  • License: GPL, AGPL, and registerware
  • Year First Published: 2010
  • Last Revision Date: 4/9/2021
  • Product Website: neo4j.com
  • DB-Engines Overall Ranking: 20

One of the first object databases around.

Neo4J supports indexing. It stores data according to the based structures of a graph: An edge, which describes the relationship between entities in the graph; the node, which is the conventional term for data entity; and attributes, which are the properties that describe each node. Also, Neo4J supports the labeling of nodes and edges. Labeling enables a finer degree of granularity for searching and filtering the graph.

Neo4J is implemented in Java. It supports a special declarative language named, Cypher. Cypher is an open specification targeted at graph databases. Cypher supports the usual create, read, update and delete (CRUD) operations. The example below shows an example of a Cypher expression:

MATCH (tom:Person {name: Elvis Presley})

RETURN tom

The example above represents the query logic, Find Person nodes in the graph that have a name of Elvis Presley WHERE tom is a variable that contains the result of the query that is to be returned.

In addition to its built-in capabilities, Neo4J supports user-defined functions that allow developers to implement custom logic.

Neo4J is used by companies such as Allianz, Lyft, Comcast, Airbnb, and eBay, to name a few documented case studies.

OrientDB

  • Maintainer: CallidusCloud/SAP
  • License: Apache 2.0
  • Year First Published: 2010
  • Last Revision Date: 2/2021
  • Product Website: https://orientdb.org/
  • DB-Engines Overall Ranking: 80

OrientDB is an open-source graph database targeted at large-scale data sets. However, the database can also support key-value storage, although that's not its primary purpose.

Developers and users can work with the database using an augmented version of SQL that supports defining the relation between entities as well as the entities themselves. Also, OrientDB ships with a web dashboard that allows both users and developers to create complex graphs visually.

OrientDB ships with a feature named Teleporter, which allows relational databases to be imported into OrientDB. OrientDB provides client libraries for programming languages such as Java, .NET, Node.js, PHP, Go, Python and Ruby.

OrientDB is used by companies such as Popcorn, SYSNET International, and NewsSC, to name a few.

OrientDB was acquired by SAP in 2018.

General reference resources

The following resources are used for general background information about the databases reviewed in this piece. Particular references are noted as links in the details about a specific database.

What to read next

Topics:   Software   Data management  
Author’s photo

Bob Reselman

Bob Reselman is a nationally known software developer, system architect, industry analyst, and technical writer/journalist. More about me

Related Content

OUR BEST CONTENT, DELIVERED TO YOUR INBOX