Ch2
Differences Between AWS Database Types🔗
The AWS Database Landscape🔗
In AWS three primary types of services used by Consumers : Compute, Storage, Database
There are two types of databases
- Relation (SQL)
- optimized around data storage
- based on Structured Query Language for data retrieval
- Non-Relational
- process unstructured and semi-structured data quickly
- often distributed on multiple nodes
There are nine primary categories of Databases available in AWS
- Relational
- Key-Value
- Document
- In-Memory
- Graph
- Columnar
- Time-Series
- Quantum Ledger
- Search
- Size, Shape and Computational Requirements of data gives idea to developers which kind of database is best for them.
- Cloud promises agility, ability to use and select appropriate database or multiple of them according to workload, rather than using one general purpose database.
- Scalability and Elasticity is another benefit from cloud
Types of Workloads
- Operational
- Online Transactional Processing (OLTP) : OLTP is centered around a set of common business processes that are : Regular, Repeatable and Durable.
- Examples includes : E-commerce, IT, CMS, etc.
- Analytical
- Online Analytics Processing (OLAP) : Run for Business Intelligence Workloads or Data Analytics
- Workloads are often : Retrospective(Company’s Quarter Analysis), Streaming(Real Time Processing) and Predictive(ML/AI)
Relational Databases🔗
Schema : structure that needs to defined before entering data into the database and designed based on reporting requirements.
- Relation DBs are highly structured and have been around since 60s.
- Schema changes are expensive in terms of time and compute power, with risk of corrupting data
- Data is stored in tables (Relation)
- Each Row is referred as Record and Each Column is referred as Key, and the value referenced is called as Attribute
- Each Table will have unique primary key that is used for relational purposes for other tables via foreign key.
- Data Integrity(data stored is reliable and accurate) is particular concern in RDBS : ACID transactions
- Atomicity : A single database transaction (which succeeds or fails completely)
- Consistency : Transaction must take database from one valid state to another valid state.
- Isolation : Two transactions don’t interfere with each other
- Durability : Data changes become permanent once the transaction is committed to database.
- Keys and Data Integrity
- Primary and Foreign Key are constraints
- Entity Integrity ensure primary key is unique to table and it has a value (NOT NULL)
- Referential Integrity ensures that foreign key is primary key to its originating table. Orphaned Data is BAD !
- API of relational database is SQL (Structured Query Language)
- Relation databases have Data Access Controls and features like : Authentication, Authorization and Audit Logging
**Data Normalization: ** Normalisation is a process where information is organised efficiently and consistently before storing it.
- Duplicate data is discarded
- Closely related fields are aggregated
Scaling and Sharding
- Relational Databases are not partition tolerant, data partitions are called as shards. Two shards needs to co-ordinate to validate data consistency and it becomes costly. Both shards will same number of Keys (Horizontal Scaling (adds copy of the db))
- Most of the times Relational databases are scaled vertically (growing server by adding memory, cpu and a disk volume).
AWS Relation Database Engines(6) : Aurora(native to AWS), MySQL, Postgres, MariaDB, Oracle and Microsoft SQL Server.