Compiler

created : Mon, 01 Jan 0001 00:00:00 +0000
modified : Mon, 01 Jan 0001 00:00:00 +0000

layout : wiki title : 컴파일러 수업 정리 summary : 2022-1 컴파일러 수업 정리 date : 2022-03-25 22:26:53 +0900 lastmod : 2022-06-15 15:53:46 +0900 tags : [lecture] draft : false parent : lectures

Introduction

The Move to Higher-Level Languages

Using High-Level Languages is a Free Lunch?

The Major Role of Language Processors

Two Representative Strategies

CompilationInterpretation
What to translateAn entire source programOne statement of a source program
When to translateOnce before the program runsEvery time when the statement is executed
Translation ResultA target program (equivalent to the source program)Target code (equivalent to the statement)
ExamplesC, C++Javascript

Common Language-Processing Systems

Requirements for Designing Good Compilers

Structure of Modern Compilers

Lexical Analyzer (Scanner)

Syntax Analyzer (Parser)

Semantic Analyzer

Intermediate Code Generator

Code Optimization (Optional)

Code Generator

Lexical Analysis

Part1. Specification of Tokens

Overview

Definition: Tokens

Definition: Lexemes

Class of Tokens

Lexical Anlysis does

  1. Partitioning input strings into sub-strings (lexemes)
  2. Identifying the token of each lexeme

For the specifcation of tokens

Definition: Alphabet, String, and Language

Operations on Strings

Operations on Languages

Regurlar Expressions

Regular expressionExpressed regular language
$\epsilon$$L(\epsilon) = { \epsilon }$
$a$$L(a) = { a }$, where $a$ is a symbol in alphabet $\Sigma$
$r_1 \vert r_2$$L(r_1) \cup L(r_2)$, where $r_1$ and $r_2$ are regular expressions
$r_1 r_2$$L(r_1r_2) = L(r_1) L(r_2) = { s_1 s_2 \vert s_1 \in L(r_1) \text{ and } s_2 \in L(r_2)}$
$r^*$$L(r^*) = \Cup_{i \ge 0} L(r^i)$

Rules for Regular Expressions

Examples of Specifying Tokens

To Recognize Tokens

  1. Merge the regular expression of tokens $Merged = Keyword \vert Identifier \vert Comparison \vert Float \vert Whitespace \vert \cdots$
  2. When an input stream $a_1 a_2 a_3 \cdots a_n$ is given,
mIdx = 0;
for i <= i <= n
  if a_1 a_2 ... a_n \in L(Merged), mIdx = i;
end
partition and classify a_1 a_2 ... a_{mIdx}
  1. Do the step 2 for the remaning input stream

Summary

Part 2. Recognition of Tokens

Finite Automata

Deterministic vs. Non-deterministic

DFANFA
transitionsAll transitions are deterministicSome transitions could be non-deterministic
$\epsilon$-movexo
# paths for a given inputOnly oneOne or more
Accepting conditionFor a given input, its path must end in one of accepting statesFor a given input, there must be at least one path ending in one of accepting states
ProsFast to executeSimple to represent (easy to make/understand)
ConsComplex ->space problem (exponentially larger than NFA)Slow -> performance problem (several paths)

Procedures for Implementing Lexical Analyzers

Regular Expressions to NFAs

NFAs to DFAs

Summary

Syntax Analysis (Parser)

Part 1. Context-Free Grammars (CFG)

Syntax Analyzer

  1. Decides whether a given set of tokens is valid or not
  1. Creates a tree-like intermediate representation (e.g., parse tree) that depicts the grammatical structure of the token stream

Why don’t we use regular expressions?

Context-Free Grammars (CFG)

Derivations

Token Validation Test

  1. Decides whether a given set of tokens is valid or not
  1. Creates a tree-like intermediate representation (e.g., parse tree) that depicts the grammatical structure of the token stream

Semantic Analyzer

Part 1: Scope Checking

The Most-Closely Nested Scope Rule

Implementation of Scope Checking


Summary: Scope Checking

Part 2: Type Checking

Type, Type System, and Type Checking

Two Kinds of Languages

How Types are Used/Checked in Practice?

Rules of Inference

Problem #1: Free Variables

Solution: Adding Scope Information

Solution: Defining Well-Formedness Rules

How Types are Used/Checked in Practice?


Summary: Type checking

Compilers: Intermediate Code Generator

Tree Address Code (TAC)

Intermediate Code Generator

AST: Abstract Syntax Tree

TAC:Tree Address Code

Types of TAC:Variable Assignment

How to Represent TAC: Quadruples

AST to TAC


Summary: Intermediate Code Generator

Code Optimization

Local Optimization

Intermediate Code Optimizer

Basic Blocks

Control Flow Graphs

Types of Intermediate Code Optimizations

Local Optimizations

Implementation of Local Optimizations


Summary: Intermediate Code Optimizer

Global Optimization

Global Optimizations

Global Dead Code Elimination

Global Copy Propataion / Common Subexpressions Eliminzation

Summary: Global Optimizations

Compilers: Code Generation

Part 1: Runtiem Environment

Code Generator

Runtime Environment

Object Representations

Function Representations

Activations

Activation Records


Memory Layout


Summary: Runtime Envornment

Part 2: Code Generation

Assembly Languages

MIPS assembly

Better Register Allocation

Graph Coloring = Register ALlocation