Introduction to Databases#

What types of storage do you know?#

Storage like Tetris

What types of storage do you know?#

Data Storage Evolution

Types of Data#

Unstructured

  • Data does not conform to a rigid structure.

Semi-structured

  • Data conforms to a schema but deviations are possible

Structured

  • Schema and types of data are known and encoded in the system.

Unstructured Data#

  • Free text - ‘🎉What a wonderful day Deployed-Dill!’

  • Images

  • Audio files

  • Can be stored in “object store” - AWS S3

Semi-structured Data#

  • Stored in document databases, e.g. “MongoDB”

  • JSON - JavaScript object notation

Structured Data#

  • Standardized format with encoded and enforced structure which

    • can be processed efficiently by software and humans alike

    • can be stored and managed more easily

    • lends itself easily to mathematical analysis

  • Rows usually represent data points/instances

  • Columns usually represent data attributes

  • Definable attributes - same attributes for all records

  • Relational attributes - common values that link different datasets together

classroom_id

classroom_capacity

classroom_number

bootcamp_id

1

10

100

A

2

15

101

B

3

20

102

C

Databases#

Why learn about Databases?#

Why?

  • Most of (large) data is stored in databases

  • An analyst/data scientist needs to be able to connect to a database and access its data to unlock insights.

What?

  • Understand the basics of databases

  • Be able to connect to a database

How?

  • Learn about database types and how they are structured

  • Connect to a database and explore its content

What are Databases?#

  • A systematic collection of data

  • Data is either stored on disk or in-memory (faster)

  • Support electronic storage and manipulation of data

Notes:

Disk storage –> hard drive of a computer –> large amounts –> persistent –> data retrival slower

   In-Memory RAM --> limited amount --> volatile -->  data retrival faster
   
   Scenario: E-comerce Website
   
   disk --> store information of product --> stable in time
                        order history --> insight what more profitable
   
   in-memory: rapid access/update user shopping card in realtime

Types of Databases#

RDBMS: Relational Database Management Systems

  • use SQL to query RDBMS

  • have a predefined schema

  • data is stored in tabular form of columns and rows

  • the relationship between data is relational

  • Examples: Postgres, MySql, Oracle, SQLite

NoSQL Databases

  • don’t use SQL as the primary language

  • have no predefined schema

  • Examples: Neo4j, MongoDB

RDBMS#

  • Many different types of databases exist and each uses a different flavour of SQL.

  • Their syntax can differ, but the core concepts are the same.

  • Some databases will implement a subset of the functionality.

  • Some DB will be optimized for speed of read, others for speed of write.

Database Structure#

  • A database consists of one/multiple schemas

  • Schemas consist of tables

  • Tables consist of columns and rows

  • A column is a variable and has a unique name

  • A row is an observation

  • Every cell is a single value

Entity-Relationship model#

  • Data consists of entities of type object, class, person or place

  • The property of an entity is described through their attribute(s)

  • Relationships describe the relation between entities

  • Different types of relationship exist

Entity-Relationship types#

  • One-to-one (1:1)

  • One-to-many (1:n) / Many-to-one (n:1)

  • Many-to-many (n:n)

Entity-Relationship types#

  • One-to-one (1:1)

  • One-to-many (1:n) / Many-to-one (n:1)

  • Many-to-many (n:n)

Entity-Relationship types#

  • One-to-one (1:1)

  • One-to-many (1:n) / Many-to-one (n:1)

  • Many-to-many (n:n)

Entity-Relationship types#

  • One-to-one (1:1)

  • One-to-many (1:n) / Many-to-one (n:1)

  • Many-to-many (n:n)

Relational Databases#

  • Tables are related via primary and foreign keys

  • Each table has one primary key that is unique for each record

  • A foreign key is a field (or collection of fields) in one table, that refers to the primary key in another table

ERM Exercise#

Group Work#


  1. Split up in 4 groups

  2. Open the Miro Board we send in Discord

  3. Create between 4 and 8 entities

  4. Write down properties (columns) of entities

  5. Model dependencies (1:1, 1:m, m:n)

  6. Present your final diagram and explain decisions

  7. You have 30 Minutes


Topic

Airport

Chocolat Factory

Roller Coaster Park

Restaurant

Connecting to a Database#

SQL Client / Database IDE#

  • IDE = Integrated Development Environment

  • Powerful software that can be used to connect to a database and retrieve and visualise data (and more!)

  • Local or in the cloud

  • Collection of open-source, free and paid software available

Local SQL Clients vs. Cloud SQL Clients#

  • Local: Installed and run locally on your machine

  • Cloud: Deployed in the cloud and accessed via a web-interface



local_vs_cloud

DBeaver#

In this course we will use DBeaver

Why?

  • Free

  • Easy to use

  • Works for many different types of databases

  • Cross platform (Windows, Linux, Mac OS, Solaris)

Setting up DBeaver#

Open DBeaver > Settings > Editors

  • Enable upper case: SQL Editor > Formatting > Keyword Case > Set to: Upper

  • Add line numbers: Text Editors > Show line numbers > Tick box

Connect to a PostgreSQL database#

  1. Click on “New Database Connection”

  2. Search for and select PostgreSQL

  3. Enter the connection details below

Host: ds-sql-playground.c8g8r1deus2v.eu-central-1.rds.amazonaws.com
Port: 5432
Database: postgres
Username Will be posted in Slack/Zoom Chat
Password Will be posted in Slack/Zoom Chat

Exploring the database#

Find the database connection in your “Database Navigator” pane on the left

Expand it to postgres > postgres > Schemas > introduction > Tables

Check out the introduction’s tables and answer the following questions:

  1. What happens if you double click on a table?

  2. What is the first and last entry listed in the records table?

  3. What data type is the column record_date in the records table?