Introduction to Databases#
What types of storage do you know?#
What types of storage do you know?#
Types of Data#
Unstructured
Data does not conform to a rigid structure.
Semi-structured
Data conforms to a schema but deviations are possible
Structured
Schema and types of data are known and encoded in the system.
Unstructured Data#
Free text - ‘🎉What a wonderful day Deployed-Dill!’
Images
Audio files
Can be stored in “object store” - AWS S3


Semi-structured Data#
Stored in document databases, e.g. “MongoDB”
JSON - JavaScript object notation

Structured Data#
Standardized format with encoded and enforced structure which
can be processed efficiently by software and humans alike
can be stored and managed more easily
lends itself easily to mathematical analysis
Rows usually represent data points/instances
Columns usually represent data attributes
Definable attributes - same attributes for all records
Relational attributes - common values that link different datasets together
classroom_id |
classroom_capacity |
classroom_number |
bootcamp_id |
---|---|---|---|
1 |
10 |
100 |
A |
2 |
15 |
101 |
B |
3 |
20 |
102 |
C |
Databases#
Why learn about Databases?#
Why?
Most of (large) data is stored in databases
An analyst/data scientist needs to be able to connect to a database and access its data to unlock insights.
What?
Understand the basics of databases
Be able to connect to a database
How?
Learn about database types and how they are structured
Connect to a database and explore its content
What are Databases?#
A systematic collection of data
Data is either stored on disk or in-memory (faster)
Support electronic storage and manipulation of data
Notes:
Disk storage –> hard drive of a computer –> large amounts –> persistent –> data retrival slower
In-Memory RAM --> limited amount --> volatile --> data retrival faster
Scenario: E-comerce Website
disk --> store information of product --> stable in time
order history --> insight what more profitable
in-memory: rapid access/update user shopping card in realtime
Types of Databases#
RDBMS: Relational Database Management Systems
use SQL to query RDBMS
have a predefined schema
data is stored in tabular form of columns and rows
the relationship between data is relational
Examples: Postgres, MySql, Oracle, SQLite
NoSQL Databases
don’t use SQL as the primary language
have no predefined schema
Examples: Neo4j, MongoDB
RDBMS#
Many different types of databases exist and each uses a different flavour of SQL.
Their syntax can differ, but the core concepts are the same.
Some databases will implement a subset of the functionality.
Some DB will be optimized for speed of read, others for speed of write.

Database Structure#
A database consists of one/multiple schemas
Schemas consist of tables
Tables consist of columns and rows
A column is a variable and has a unique name
A row is an observation
Every cell is a single value

Entity-Relationship model#
Data consists of entities of type object, class, person or place
The property of an entity is described through their attribute(s)
Relationships describe the relation between entities
Different types of relationship exist

Entity-Relationship types#
One-to-one (1:1)
One-to-many (1:n) / Many-to-one (n:1)
Many-to-many (n:n)
Entity-Relationship types#
One-to-one (1:1)
One-to-many (1:n) / Many-to-one (n:1)
Many-to-many (n:n)

Entity-Relationship types#
One-to-one (1:1)
One-to-many (1:n) / Many-to-one (n:1)
Many-to-many (n:n)

Entity-Relationship types#
One-to-one (1:1)
One-to-many (1:n) / Many-to-one (n:1)
Many-to-many (n:n)

Relational Databases#
Tables are related via primary and foreign keys
Each table has one primary key that is unique for each record
A foreign key is a field (or collection of fields) in one table, that refers to the primary key in another table

ERM Exercise#
Group Work#
Split up in 4 groups
Open the Miro Board we send in Discord
Create between 4 and 8 entities
Write down properties (columns) of entities
Model dependencies (1:1, 1:m, m:n)
Present your final diagram and explain decisions
You have 30 Minutes
Topic |
---|
Airport |
Chocolat Factory |
Roller Coaster Park |
Restaurant |
Connecting to a Database#
SQL Client / Database IDE#
IDE = Integrated Development Environment
Powerful software that can be used to connect to a database and retrieve and visualise data (and more!)
Local or in the cloud
Collection of open-source, free and paid software available
Local SQL Clients vs. Cloud SQL Clients#
Local: Installed and run locally on your machine
Cloud: Deployed in the cloud and accessed via a web-interface
DBeaver#
In this course we will use DBeaver

Why?
Free
Easy to use
Works for many different types of databases
Cross platform (Windows, Linux, Mac OS, Solaris)
Setting up DBeaver#
Open DBeaver
> Settings
> Editors
Enable upper case:
SQL Editor
>Formatting
>Keyword Case
> Set to: UpperAdd line numbers:
Text Editors
>Show line numbers
> Tick box
Connect to a PostgreSQL database#
Click on “New Database Connection”
Search for and select PostgreSQL
Enter the connection details below
Host: ds-sql-playground.c8g8r1deus2v.eu-central-1.rds.amazonaws.com
Port: 5432
Database: postgres
Username Will be posted in Slack/Zoom Chat
Password Will be posted in Slack/Zoom Chat

Exploring the database#
Find the database connection in your “Database Navigator” pane on the left
Expand it to postgres
> postgres
> Schemas
> introduction
> Tables
Check out the introduction’s tables and answer the following questions:
What happens if you double click on a table?
What is the first and last entry listed in the
records
table?What data type is the column
record_date
in therecords
table?