Database System

created : Sun, 18 Apr 2021 18:42:47 +0900
modified : Wed, 29 Jan 2025 02:53:20 +0900
lectures database

Chapter 12. Physical Storage Systems

Classification of Physical Storage Media

Storage Hierarchy

Storage Interfaces

Magnetic Disks

Magnetic Hard disk Mechanism

Performance Measures of Disks

Flash Storage

SSD Performance Metrics

Storage Class Memory


Summary

Chapter 13. Data Storage Structures

File Organization

Fixed-Length Records (고정 길이 레코드)

Variable-Length Records (가변 길이 레코드)

Variable-Length Records : Slotted Page Structure (가변 길이 레코드가 페이지에 저장되는 구조, Slotted Page structure)

Storing Large Objects

Organization of Records in Files

Heap File Organization

Sequential File Organization

Multitable Clustring File Organization

Partitioning

Data Dictionary Storage

Storage Access

Buffer Manger

Buffer-Replacement Policies

Column-Oriented Storage

Chatper 14. Indexing

Basic Concepts

Index Evaluation Metrics

Ordered Indicies

Dense Index Files

Sparse Index Files

Multilevel Index

B+-Tree Index Files

B+-Tree Node Structure

Leaf Nodes in B+-Trees

Non-Leaf Nodes in B+-Trees

Observations about B+-Trees

Queries on B+-Trees

funciton find(v)
1. C = root
2. while (C is not a leaf node)
  1. Let i be least number s.t V <= K_i,
  2. if there is no such number i then
  3.  Set C = last non-null pointer in C
  4. else if (v = C.K_i) Set C = P_{i+1}
  5. else set C= C.P_{i}
3. if for some i, K_i - V then return C.P_i
4. else return null /* no record with search-key value v exists. */
function findRange(lb, ub)
/* Returns all records with search key value V such that lb <= V <= ub */
  Set result Set = {};
  Set C = root node
  while (C is not a leaf node) begin
    Let i = smallest nubmer such that lb <= C.K_i
    if there is no such number i then begin
      Let P_m = last non-null pointer in the node
      Set C= C.P_m
    end
    else if (lb = C.K_i) then Set C = C.P_{i+1}
    else Set C = C.P_i /* lb < C.K_i */
  end
  /* C is a leaf node */
  Let i be the least value such that K_i >= lb
  if there is no such i
    then Set i = 1 + number of keys in C; /* To force move to next leaf */
  Set done = false;
  while (not done) begin
    Let n = nubmer of keys in C.
    if (i <= n and C.K_i <= ub) then begin
      Add C.P_i to resultSet
      Set i = i + 1
    end
    else if (i <= and C.K_i > ub)
      then Set done = true;
    else if (i > n and C.P_{n + 1} is not null)
      then Set C = C.P_{n + 1}, and i = 1 /* Move to next leaf */
    else Set done = true; /* No more leaves to the right */
  end
  return result Set;

Non-Unique Keys

Updates on B+-Trees: Insertion

procedure insert(value K, pointer P)
  if (tree is empty) create an empty leaf node L, which is also the root
  else Find the leaf node L that should contain key value K
  if (L has less than n - 1 key values)
    then insert_in_leaf(L, K, P)
    else begin /* L has n - 1 key values already, split it */
      Create node L`
      Copy L.P_1, ... L.K_{n-1} to a block of memory T that can hold n (pointer, key-value) pairs
      insert_in_leaf(T, K, P)
      Set L`.P_n = L.P_n; Set L.P_n = L`
      Erase L.P1 through L.K_{n-1} from L
      Copy T.P_1 through T.K_{\lceil n/2 \rceil} from T into L starting at L.P_1
      Copy T.P_{\lceil n/2 \rceil + 1} through T.K_n from T into L` strarting at L`.P_1
      Let K` be the smallest key-value in L`
      insert_in_parent(L, K`, L`)
    end

procedure insert_in_leaf(node L, value K, pointer P)
  if (K < L.K_1)
    then insert P, K into L just before L.P_1
    else begin
      Let K_i be the highest value in L that is less than or equal to K
      Insert P, K into L just after L.K_i
    end

procedure insert_in_parent(node N, value K`, node N`)
  if (N is the root of the tree)
    then begin
      Create a new node R containing N, K`, N` /* N and N` are pointers */
      Make R the root of the tree
      return
    end
  Let P = parent(N)
  if (P has less than n pointers)
    then insert (K`, N`) in P just after N
    else begin /* Split P */
      Copy P to a block of memory T that can hold P and (K`, N`)
      Insert (K`, N`) into T jsut after N
      Erase all entries from P; Create node P`
      Copy T.P_1 ... T.P_{\lceil (n+1)/2 \rceil} into P
      Let `` = T.K_{\lceil(n+1) / 2 \rceil}
      Copy T.P_{\lceil (n+1)/2 \rceil + 1} ... T.P_{n + 1} into P`
      insert_in_parent(P, K``, P`)
    end

Updates on B+-Trees: Deletion

Complexity of Updates

Non-Unique Search Keys

B+-Tree File Organization

Other Issues in Indexing


Indexing Strings

Bulk Loading and Bottom-Up Build

Static Hasing

Handling of Bucket Overflows

Deficiencies of Static Hasing

Dynamic Hasing

Comparison of Ordered Indexing and Hashing

Multiple-Key Access

Indicies on Multiple Keys

Indices on Multiple Attributes

Other Features

Creation of Indices

Index Definition in SQL

Write Optimized Indices


Bitmap Indices

An Introduction to DB Programming

DB API

Tracsactions in application program

JDBC

Dynamic SQL

public static void CSEE_Student_Name(String userid, String paswd) {
  try {
    Connection conn = DriverManager.getConnection("jdbc:mysql://localhost/dbname?useUnicode=True", userid, passwd);
    Statement stmt = conn.createStatemet();
    ResultSet rset = stmt.executeQuery("select name from student where dept_name= 'test'");

    while (rset.next()) {
      System.out.println("Student name: " + rset.getString("name"));
    }
  } catch (SQLException sqle) {
    System.out.println("SQLException : " + sqle);
  }
}

ESQL

Dynamic SQL in ESQL

ODBC

Web/DB Interoperability:ASP/ADO

Web/DB Interoperability

Chapter 15. Query Processing

Basic Steps in Query Processing

  1. Parsing and translation:
  1. Optimization
  2. Evaluation:

Measures of Query Cost

Selection Operation

Selections Involving Comparisions

Implementation of Comple Selections

Bitmap Index Scan

Sorting

External Sort-Merge

Let M denote memory size
1. Create sorted runs. Let i be 0 initially.:
   Repeatedly do the following till the end of the relation:
   (a) Read M blocks of relation into memory
   (b) Sort the in-memory blocks
   (c) Write sorted data to run R_i : increment i.
Let the final value of i be N
1. Merge the runs (N-way merge). We assume (for now) that N < M:
  1. Use N blocks of memory to buffer input runs, and 1 block to buffer output. Read the ifrst block of each run into its buffer page
  2. repeat:
    1. Select the first record (in sort order) among all buffer pages
    2. Write the record to the output buffer. If the output buffer is full write it to disk
    3. Delete the record from tis input buffer page. If the buffer page becomes empty then read the next block (if any) of the run into the buffer.
  3. until all input buffer pages are empty

Join Operation

Nested-Loop Join

for each tutple t_r in r do begin
  for each tuple t_s in s do begin
    test pari(t_r, t_s) to see if they satisfy the join condition \theta
    if they do, add t_r * t_s to the result
  end
end

Block Nested-Loop Join

for each block B_r of r do begin
  for each block B_s of s do begin
    for each tuple t_r in B_r do begin
      for each tuple t_s in B_s do begin
        Check if (t_n, t_s) satisfy the join condition
        if they do, and t_r * t_s to the result
      end
    end
  end
end

Indexed Nested-Loop Join

Merge-Join

  1. Sort both relations on their join attribute (if not already sorted on the join attributes).
  2. Merge the sorted relations to join them:
  3. Join step is similar to the merge stage of the sort-merge algorithm.
  4. Main difference is handling of duplicate values in join attribute - every pari with same value on join attribute must be matched
pr := address of first tuple of r
ps := address of first tuple of s
while (ps != null and pr != null) do
  begin
    t_s = tuple to which ps points;
    S_s : = {t_s};
    set ps to point to next tuple of s;
    done := false;
    while (not done and ps != null) do
      begin
        t_s' := tuple to which ps points;
        if (t_s'[JoinAttrs] = t_s[JoinAttrs])
          then begin
              S_s := S_s \cup { t_s '}:
              set ps to point to next tuple of s;
          end
        else done := true;
      end
     t_r := tuple to which pr points;
     while (pr != null and t_r[JoinAttrs] < t_s[JoinAttrs]) do
       begin
         set pr to point to next tuple of r;
         t_r := tuple to which pr points;
       end
     while (pr != null and t_r[JoinAttrs] = t_s[JoinAttrs]) do
       begin
         for each t_s in S_s do
           begin
             add t_s \bowtie t_r to result;
           end
         set pr to point to next tuple of r;
         t_r := tuple to which pr points;
       end
    end

Hash-Join

Hadnling of Overflows

Cost of Hash-Join

Complex Joins

Joins over Spatial Data

Other Operations

Answering Keyword Queries

Outer Join

Evalution of Expressions

Materialization

Pipelining

Blocking Operations

Pipeline Stages

Evaluation Algorithms for Pipelining

Pipelining for Continuous-Stream Data

Query Processing In memoery

Cache Conscious Algorithms


생각보다 수업시간에 안다룬 내용이 많았는데 그냥 그런가보다 하고 정리했다. 어짜피 도움되는 내용이기도 하고, PPT만 안했지 말로는 설명한 부분들도 있었기 때문이다.

Chapter 16. Query Optimization

Viewing Query Evaluation Plans

Generting Equivalent Expressions

Transofrmation of Relational Expressions

Equivalence Rules

  1. Conjunctive selection operations can be deconstructed into a sequence of individual selections.:
  1. Selection operations are commutative.:
  1. Only the last in a sequnce of projection operations is needed, theothers can be omitted.:
  1. Selections can be combined with Cartesian products and theta joins.:
  1. Theta-join operations (and natural joins) are commutative.:
  1. :
  1. The selection operation distributes over the theta join operation under the following two conditions:
  2. When all the attributes in $$\theta_0$$ involve only the attributes of one of the expressions ($$E_1$$) being joined.: * $$\sigma_{\theta_0} (E_1 \bowtie_{\thtea} E_2) \equiv (\sigma_{\theta_0} (E_1)) \bowtie_{\theta} E_2$$
  3. When $$\theta_1$$ involves only the attributes of $$E_1$$ and $$\theta_2$$ involves only the attributes of $$E_2$$: * $$\sigma_{\theta_1 \wedge \theat_2}(E_1 \bowtie_{\theta} E_2) \equiv (\sigma_{\theta_1} (E_1)) $$
  4. The projection operation distributes over the theta join operation as follows:
  5. if $$\theta$$ involves only attributes from $$L_1 \cup L_@$$: * $$\Pi_{L_1 \cup L_2} (E_1 \bowtie_{\theta} E_2) \equiv \Pi_{L_1} (E_1) \bowtie_{\theta} \Pi_{L_2} (E_2)$$
  6. In general, consider a join $$E_1 \bowtie_{\theta} E_2$$: * Let $$L_1$$ and $$L_2$$ be sets of attributes from $$E_1$$ and $$E_2$$, respectively. * Let $$L_3$$ be attributes of $$E_1$$ that are involved in join condition $$\theta$$, but are not in $$L_1 \cup L_2$$ * Let $$L_4$$ attributes of $$E_2$$ that are involved in join condition $$\theta$$, but are not in $$L_1 \cup L_2$$:
    • $$\Pi_{L_1 \cup L_2} (E_1 \bowtie_{\theta} E_2) \equiv \Pi_{L_1 \cup L_2} (\Pi {L_1 \cup L_3}(E_1) \bowtie{\theta} \Pi_{\L_2 \cup L_4 } (E_2))$$
    • Similar equivalences hold for outerjoin operations
  7. The set operations union and intersection are commutative:
  1. Set union and intersection are associative.:
  1. The selection operation distributes over $$\cup,\cap$$ and -:
  1. The projection operation distributes over union:
  1. Selection distributes over aggregation as below:
  1. Outerjoin is commutative?:
  1. Selection distributes over left and right outerjoins
  2. Outerjoins can be replaced by inner joins under some conditions

Enumeration of Equivalent Expressions

Implementing Transformation Based Optimization

Cost Estimation

Choice of Evaluation Plans

Cost-Based Optimization

Dynamic Programming in Optimization

Join Order Optimization Algorithm

procdedure findbestplan(S)
  if (bestplan[S].cost != \infty)
    return bestplan[S]
  // else bestplan[S] has not been computed earlier, compute it now
  if (S contains only 1 relation)
    set bestplan[S].plan and bestpaln[S].cost based on the best wy of accessing
    S using selections on S and indices (if any) on S else for each non-empty
    subset S1 of S such that S1 != S

    P1 = findbestplan(S1)
    P2 = findbestplan(S - S1)
    for each algorithm A for joining results of P1 and P2
      // For indexed-nested loops join, the outer could be P1 or P2
      // Similarly for hash-join, the build relation could be P1 or P2
      // We assume the alternatives are considered as separate algorithms
      if algorithm A is indexed nested loops
        Let P_i and P_0 denote inner and outer inputs
        if P_i has a single relation r_i and r_i has an index on the join attribute
          plan = "execute P_0 plan; join results of P_0 and r_i using A",
                 with any selection conditions on P_i performed as part of the join condition
          cost = P_0 cost + cost of A
        else cost = \infty; /* cannot use indexed nested loops join */
      else
        plan = "execute P1.plan; execute P2.plan;
                join results of P1 and P2 using A;"
        cost = P1.cost + P2.cost + cost of A

      if cost < bestplan[S].cost
        bestplan[S].cost = cost
        bestplan[S].plan = plan;
  return bestplan[S]

Left Deep Join Trees

Cost of Optimization

Interesting Sort orders

Cost Based Optimization with Equivalence Rules

Heuristic Optmization

Structure of Query Optimizers

Statistics for Cost Estimation

Statistical Information for Cost Estimation

Histograms

Selection Size Estimation

Size Estimation of Complex Selections

Estimation of the Size of Joins

Size Estimation for Other Operations

Estimation of Number of Distinct Values

Additional Optimization techinques

Materialized Views

Materialized View Maintenance

Incremental View Maintenance

Join Operations

Aggregation Operations

Other Operations

Handling Expressions

Query Optimization and Materialized Views

양이 너무 많다 ㅠ 학교 시험 범위는 아니라 읽어만 보고 정리는 생략했다. 수식을 입력하는데 시간이 너무 오래 걸린다. 사실 이렇게 오래동안 필기할 내용은 아니였는데 ㅠ

Chapter 17 Transaction

Transaction Concept

ACID Properties

Transaction State

Concurrent Executions

Schedules

Serializability

Simplified view of transactions

Confliciting Instructions

Conflict Serializability

View Serializability

Testing for Serializability

Test for View Serializability

Recoverable Schedules

Cascading Rollbacks

Cascadeless Schedules

Concurrency Control

Concurrency Control vs. Serializability Tests

Weak Levels of COnsistency

Levels of Consistency in SQL-92

Transaction Definition in SQL

Implementation of Isolation Levels


Chapter 18. Concurrency Control

Lock-Bassed Protocols

Schedule With Lock Grants

Deadlock

The Two-Phase Locking Protocol

Locking Protocols

Lock Conversaions

Automatic Acquisition of Locks

Implementation of Locking

Graph-Based Protocols

Tree Protocol

Deadlock Handling

More Deadlock Prevention Strategies

Deadlock Recovery

Multiple Granularity

Intention Lock Modes


Chapter 19. Recovery System

Failur Classification

Recovery Algorithms

Storage Structure

Stable-Storage Implementation

Protecting storage media from failure

Data Access

Reacovery and Atomicity

Log-Based Recovery

Immediate Database Modification

Transaction Commit

Concurrency Control and Recovery

Undo and Redo Operations

Recovering from Failure

Checkpoints

Recovery Algorithm

Log Record Buffering

Database Buffering

Buffer Management

Fuzzy Checkpointing

Failure with Loss of Nonvolatile Storage

Recovering from Failure of Non-Volatile Sotrage


생략

Chapter 20. Database System Architectures

Centralized Database Systems

Server System Architeture

Transaction Servers

Transaction Server Process Structure


Chapter 10 Big Data

Motivation

Querying Bit Data

Big Data Storage Systems

Distirbuted File Systems

Hadoop FIle System Architecture

Hadoop Distributed File System (HDFS)

Sharding

Key Value Storage Systems

Parallel and Distributed Databases

The MapReduce Paradigm

MapReduce Programming Model