Choose Right EC2 Instance for Hosting a Database

Choosing the right ec2 instance for your database is really important. The cloud services are selling like hot cakes and by far the AWS - Amazon Web Services has the biggest pie of the cake. The public cloud provides you very high flexibility and ability to spin up the infrastructure within minutes rather than working through corporate procurement and spend weeks and months of time. The three most used and basic infrastructure need why people move to cloud are

  • Compute infrastructure
  • Storage
  • Databaases

This article helps you deal with the first and the third aspect of the AWS cloud.

The need for databases:

The databases for years have been the backbone for any application. With recent times the need for the data storage has increased many folds and so has the variety of databases. In most part they can be categorized into 2 broad categories

  • Relational Databases
  • Non-Relational Databases ( Call it No-SQL) databases.

Most of the cloud providers like AWS provides a plethora of database options where you do not have to manage the infrastructure for your database. A few of the database options offered by AWS

DB TypeAWS OfferingPurpose
RelationalAmazon Aurora, Amazon RDS, Amazon RedshiftTraditional OLTP and data warehouse system
Key ValueDynamodbHigh Traffic apps like webapps, gaming etc
DocumentAmazon DocumentDBTo store content of blog etc
Time SeriesAmazon TimestreamTelemetry , Logs or IOT Data
LedgerAmazon QLDBBlockchain needs like banking, cryptocurrency etc
GraphAmazon NeptuneSocial Networking , Fraud detection etc

If you see the list above it should be able to cover most of your needs for the database that are provided as managed service. So why do you need to host a database server on a self managed virtual machine (EC2)

Why to use EC2 to host your database

Though AWS provides you with a variety of managed databases, there may be some need when you want to host your own database.

Applications that do not support RDS or other databases:

If you want to run an application that does not support RDS you have no choice but to use the database hosted on an EC2 machine. For eg. To run a PowerBI Report Server you have to host the SQL Server on the EC2.

When RDS does not support a DB feature:

Though the RDS offers you the basic database services, it may not offer the auxiliary services. For eg. if you want to use the Database Scheduler or SSIS or SSAS for SQL server, you cannot use RDS

When you need fine grained control:

The managed databases do not offer you flexibility in terms of type of storage, CPU, LDAP authentication, parameter setting etc. When you want to be in full control for your database, RDS or other managed databases may not be the best choice

Specialized Databases :

When you need to host other databases not provided as managed DB. For eg. Cassandra, HBase, Exasol, Neo4J, SAP Hana etc.

Factors to keep in mind :

Before we jump into exact type of instances, there are a few factors we need to consider :

  • Always start with small footprint and increase resources as you need. There is a cost to everything in the cloud and scaling up is easier than data center
  • Never underestimate the IO : The disk IO is an important factor and the storage attached depends on the instance type. For detailed comparison of EC2 IO visit : https://www.aws-cost.com
  • Network bandwidth: This is often ignored factor in the on-prem databases, as the network speed remains same irrespective of the machine size. However in cloud it's a different story altogether. The network speed is also determined by instance type/size. For detailed comparison of IO network speeds visit : https://www.aws-cost.com
  • CPU v/s Memory intensive workloads: The EC2 instances are tied to a specific ratio of CPU to memory unlike a server in the data-center, where you have flexibility to choose your own level of CPU and network. For detailed comparison of CPU & RAM for EC2 visit : https://www.aws-cost.com

Choosing the right EC2 for your database :

While selecting the right EC2 instance there are a multitude of factors that affect your decision

  • Type of Database : This is the most important factor that determines your choice of database. A RDBMS may have different resource needs v/s in-memory database v/s no-sql database
  • Type of workload : IO vs CPU vs Memory intensive workload. This is related to the type of database as well.

The following summarizes the best instance type to use for your Database. Refer to https://www.aws-cost.com

Memory Intensive Workloads :

Most RDBMS are memory intensive. For moderate to large size use the r-family instances. The latest subfamily(r4 vs r5) the better it is in terms of cost and resources.

IO Intensive Workloads :

For a database that has need for higher io use the i-family instance. For eg. a cassandra database or an HBase database

In-Memory databases :

For in-memory databases like SAP Hana or Apache Ignite or Redis Cache use the r-family or x-family instance. An x1e.32xlarge has 3,904 GiB RAM.

Nimble Size Database :

For small sized database use the m-family or c-family instance.

Rakesh Ghodasara
Rakesh Ghodasara
Data Engineering, Data Architecture, Cloud & BigData

My work area includes Data Engineering, Big Data, Business Intelligence & Cloud Technologies