Compilers (Dragon Book) 목차 훑어보기

들어가며

최근 JVM HotSpot에 기여해보고자 노력하고 있다. 코드를 읽고 패치를 작성하다 보면, 컴파일러에 대한 기본 지식이 부족하여 아쉬움을 느끼는 순간이 있다.

그래서 컴파일러의 기초부터 공부해보기로 했다. 선택한 책은 Compilers: Principles, Techniques, and Tools 2판, 일명 Dragon Book이다.

지금은 AI와 함께 학습할 수 있는 시대이다. 책을 읽다가 이해가 안 되는 부분이 있으면 바로 AI에게 물어보고, 개념을 다른 각도에서 설명받을 수 있다. 이 장점을 적극 활용하여 빠르게 컴파일러 지식을 쌓아보려 한다.

시작에 앞서, 목차를 훑어보며 전체적인 흐름을 파악해보자. 이 책은 크게 세 부분으로 나눌 수 있다.

프론트엔드 (1~5장): 소스 코드를 분석하여 중간 표현으로 변환하는 과정
런타임 환경과 중간 코드 (6~7장): 중간 코드 생성과 프로그램 실행 시 메모리 관리
백엔드와 최적화 (8~12장): 기계어 생성과 다양한 최적화 기법

프론트엔드: 소스 코드 분석 (1~5장)

Chapter 1. Introduction

컴파일러의 전반적인 구조와 각 단계(어휘 분석, 구문 분석, 의미 분석, 중간 코드 생성, 최적화, 코드 생성)를 개관한다. 프로그래밍 언어의 발전사와 컴파일러 기술이 어디에 응용되는지도 다룬다.

Language Processors
The Structure of a Compiler
The Evolution of Programming Languages
The Science of Building a Compiler
Applications of Compiler Technology
Programming Language Basics

Chapter 2. A Simple Syntax-Directed Translator

간단한 번역기를 직접 만들어보면서 컴파일러의 핵심 개념을 체험한다. 문법 정의, 파스 트리, 구문 지시 번역(Syntax-Directed Translation), 어휘 분석기, 심볼 테이블, 중간 코드 생성까지 작은 규모로 한 번 훑어보는 장이다.

Syntax Definition
Syntax-Directed Translation
Parsing
A Translator for Simple Expressions
Lexical Analysis
Symbol Tables
Intermediate Code Generation

Chapter 3. Lexical Analysis

소스 코드를 토큰 단위로 쪼개는 어휘 분석(Lexical Analysis)을 본격적으로 다룬다. 정규 표현식, 유한 오토마타(NFA/DFA), 그리고 Lex 같은 어휘 분석기 생성 도구의 원리를 배운다.

The Role of the Lexical Analyzer
Input Buffering
Specification of Tokens
Recognition of Tokens
The Lexical-Analyzer Generator Lex
Finite Automata
From Regular Expressions to Automata
Design of a Lexical-Analyzer Generator
Optimization of DFA-Based Pattern Matchers

Chapter 4. Syntax Analysis

토큰 스트림을 문법 구조로 파싱하는 구문 분석을 다룬다. 하향식(Top-Down) 파싱과 상향식(Bottom-Up) 파싱의 두 가지 접근법, 그리고 LR 파싱, LALR 파싱 같은 핵심 알고리즘을 배운다. Yacc 같은 파서 생성기도 소개된다.

Context-Free Grammars
Writing a Grammar
Top-Down Parsing
Bottom-Up Parsing
Introduction to LR Parsing: Simple LR
More Powerful LR Parsers
Using Ambiguous Grammars
Parser Generators

Chapter 5. Syntax-Directed Translation

파스 트리의 노드에 속성(attribute)을 부여하고 의미 규칙을 통해 번역을 수행하는 방법을 배운다. S-속성 정의, L-속성 정의, 번역 스킴(Translation Scheme) 등을 다루며, 파싱 과정에서 동시에 코드를 생성하는 기법을 학습한다.

Syntax-Directed Definitions
Evaluation Orders for SDD’s
Applications of Syntax-Directed Translation
Syntax-Directed Translation Schemes
Implementing L-Attributed SDD’s

런타임 환경과 중간 코드 (6~7장)

Chapter 6. Intermediate-Code Generation

소스 코드를 기계 독립적인 중간 표현(IR)으로 변환하는 방법을 다룬다. 구문 트리, DAG, Three-Address Code, SSA 형태의 중간 코드를 학습하고, 타입 검사, 제어 흐름, 백패칭(Backpatching) 같은 실전적인 주제도 배운다.

Variants of Syntax Trees
Three-Address Code
Types and Declarations
Translation of Expressions
Type Checking
Control Flow
Backpatching
Switch-Statements
Intermediate Code for Procedures

Chapter 7. Run-Time Environments

프로그램이 실행될 때 메모리가 어떻게 관리되는지 배운다. 스택 기반 메모리 할당, 활성화 레코드(Activation Record), 힙 관리, 그리고 가비지 컬렉션(Mark-and-Sweep, 복사 수집, 세대별 GC 등)까지 깊이 있게 다룬다.

Storage Organization
Stack Allocation of Space
Access to Nonlocal Data on the Stack
Heap Management
Introduction to Garbage Collection
Introduction to Trace-Based Collection
Short-Pause Garbage Collection
Advanced Topics in Garbage Collection

백엔드와 최적화 (8~12장)

Chapter 8. Code Generation

중간 코드를 실제 타겟 머신의 기계어로 변환하는 코드 생성 단계를 다룬다. 레지스터 할당, 명령어 선택, 피프홀 최적화, 트리 재작성을 통한 명령어 선택, 동적 프로그래밍 기반 코드 생성 등을 학습한다.

Issues in the Design of a Code Generator
The Target Language
Addresses in the Target Code
Basic Blocks and Flow Graphs
Optimization of Basic Blocks
A Simple Code Generator
Peephole Optimization
Register Allocation and Assignment
Instruction Selection by Tree Rewriting
Optimal Code Generation for Expressions
Dynamic Programming Code-Generation

Chapter 9. Machine-Independent Optimizations

기계에 독립적인 코드 최적화 기법들을 다룬다. 데이터 흐름 분석(Data-Flow Analysis)이 핵심 주제이며, 도달 정의(Reaching Definitions), 활성 변수(Live Variables), 상수 전파(Constant Propagation), 부분 중복 제거(Partial-Redundancy Elimination) 같은 고전적 최적화 기법을 학습한다.

The Principal Sources of Optimization
Introduction to Data-Flow Analysis
Foundations of Data-Flow Analysis
Constant Propagation
Partial-Redundancy Elimination
Loops in Flow Graphs
Region-Based Analysis
Symbolic Analysis

Chapter 10. Instruction-Level Parallelism

현대 프로세서의 명령어 수준 병렬성(ILP)을 활용하는 코드 스케줄링 기법을 다룬다. 파이프라인 실행, 데이터 의존성 분석, 기본 블록 스케줄링, 전역 코드 스케줄링, 소프트웨어 파이프라이닝 등을 배운다.

Processor Architectures
Code-Scheduling Constraints
Basic-Block Scheduling
Global Code Scheduling
Software Pipelining

Chapter 11. Optimizing for Parallelism and Locality

멀티프로세서 환경에서의 병렬 실행과 데이터 지역성(Locality) 최적화를 다룬다. 루프 병렬화, 어파인 변환(Affine Transform), 반복 공간(Iteration Space), 데이터 재사용 분석, 동기화 등 고급 최적화 주제를 학습한다.

Basic Concepts
Matrix Multiply: An In-Depth Example
Iteration Spaces
Affine Array Indexes
Data Reuse
Array Data-Dependence Analysis
Finding Synchronization-Free Parallelism
Synchronization Between Parallel Loops
Pipelining
Locality Optimizations
Other Uses of Affine Transforms

Chapter 12. Interprocedural Analysis

개별 함수를 넘어서 프로그램 전체를 대상으로 하는 프로시저 간 분석을 다룬다. 콜 그래프, 컨텍스트 민감 분석, 포인터 분석, Datalog를 이용한 데이터 흐름 표현, BDD를 활용한 구현 등을 학습한다.

Basic Concepts
Why Interprocedural Analysis?
A Logical Representation of Data Flow
A Simple Pointer-Analysis Algorithm
Context-Insensitive Interprocedural Analysis
Context-Sensitive Pointer Analysis
Datalog Implementation by BDD’s

부록

Appendix A. A Complete Front End

앞에서 배운 내용을 종합하여 완전한 컴파일러 프론트엔드를 구현한다. 어휘 분석기, 심볼 테이블, 중간 코드 생성, 파서까지 전체를 직접 만들어보는 실습이다.

Appendix B. Finding Linearly Independent Solutions

11장의 병렬화 및 지역성 최적화에서 사용되는 선형 독립 해를 구하는 수학적 방법을 다룬다.

마무리

전체적으로 보면 이 책은 소스 코드가 기계어로 변환되는 전 과정을 처음부터 끝까지 다루고 있다.

먼저 소스 코드를 토큰으로 쪼개고 (어휘 분석)
토큰을 문법 구조로 조립하고 (구문 분석)
의미를 분석하여 중간 코드로 변환하고 (의미 분석 & 중간 코드 생성)
실행 시 메모리를 어떻게 관리할지 결정하고 (런타임 환경)
중간 코드를 기계어로 변환하고 (코드 생성)
다양한 수준에서 최적화한다 (최적화)

May 7, 2026

categories: 스터디

tags: 컴파일러 , Dragon Book , JVM , HotSpot , 스터디