Large Scale C++: An Executive Summary of Chapter 0

Joseph Mariadassou
7 min readJan 11, 2024

Large Scale C++, meaning over one million LOC at least, requires some discipline to ensure that we get economies of scale. This book by John Lakos addresses some issues in the development and maintenance of large scale C++ source code. Although the ideas presented are worth considering there are few people who are in a position to affect the changes the book advocates.

The book is about organising large volumes of C++ code. The notion of physical design which was first put forth by Lakos in his earlier book is a running theme throughout the first volume. As stated in the book, software design is not independent of the language or tools used to develop the software. Hence this book will not cover how to organize the files for a large website for example.

The book is split into three volumes:

  1. Volume I: Process and Architecture
  2. Volume II: Design and Implementation (Forthcoming)i
  3. Volume III: Verification and Testing (Forthcoming)

True to the C/C++ convention the chapters are indexed from zero.

Chapter 0: Motivation

There are three dimensions to software development: Cost, Time and Functionality/Quality [Functionality and Quality should be two dimensions IMO]. “Quantity changes quality” as Karl Marx stated as the second law of Dialectical Materialism. To enable code reuse we must ensure that code is easily searchable. While good documentation helps, organising the source code helps in reducing dependencies. This improves build and test times.

Chapter 1: Compilers, Linkers, and Components

Software design depends upon the language and tools that are use to create the software. Hence to design in C++ it is necessary to understand how compilers, linkers and loaders work.

Chapter 2: Packaging and Design Rules

Usually a component consists of a unique header file and source file combination. Sometimes it may be necessary to break that rule. This chapter discusses the consequences of breaking the rule and when it may be acceptable to do so. Related components are then packaged into a single static library or shared object (in POSIX) or Dynamic Linked Library (in Windows parlance). Dependencies among packages should form a partial order (no cycles). The author claims that packages must be slotted into levels. A package at Level(N) should access directly depend upon only packages at Level(N-1).

Chapter 3: Physical Design and Factoring

The longest chapter of this volume, the third chapter is essentially where the author makes his case. Most of the chapter is concerned with levelisation: how to organise code such that packages are dependent upon other packages that are just one level below.

An Executive Summary of Chapter 0

  • The Goal: Faster, Better, Cheaper!

There are tree aspects to software engineering: Budget, Schedule and Product. If the product is complex then both budget and time to complete will increase. While one could increase the time to improve product quality and cost, in the commercial world every delay is an opportunity cost. But, every bug is also very expensive to fix after the product is released.

  • Application vs. Library Software

Any organisation developing software applications needs to make modules that are reusable. Otherwise, even in a single application if the application grows to be complex we would have a large amount of code duplication. To achieve good reuse organisations must spend effort to extract reusable modules from existing applications. Every attempt must be made to use existing libraries. Hence the source code must be properly organised. The top-down design methodology will not work well in an organisation that already has a huge code base. To be effective design should consider existing code base and tools and try to put together applications that are based solely on existing modules. This notion of software tools dictating or at least influencing design is address at length in Chapter 2.

  • Collaborative vs. Reusable Software

Often refactoring code within an application makes the code more comprehensible. But breaking up large modules might result in a collection of collaborating components that are not reusable in other applications. This is OK, if the collaborative suite of modules can be used across multiple versions of the application.

  • Hierarchically Reusable Software

Before Object Oriented Programming (OOP) became popular software engineers encouraged the use of layers or rings. Each layer dependent only upon the immediate next layer only. Structured programming mandated keeping data and source code separate. However despite a near reversal of the original practices with OOP, there is a need to layer the software when the source code is large and complex.

  • Malleable vs. Stable Software

Software ideally should be both malleable and stable. Malleable, meaning it can be easily tweaked to address similar problems that it was first envisaged for. Stable essentially implies that the software is well understood and tested. Hence any change will require a complete overhaul starting from specification and design to implementation and testing. To overcome this dichotomy OOP recommends the Open-Closed principle; open to extension and closed to change

  • The Key Role of Physical Design

Physical Design:
n. The arrangement of source code within files and files within libraries.
v.tr. To partition source code among files and libraries
Over 60 years ago Peter Drucker claimed that managers spent more time looking for a document than reading it. This is probably true today as well as despite the fact that computerisation has improved searching the number of documents is so large that people do spend a lot of time searching. While there are tools that can present source code in ways that are easily searched it is still better to organise the source code such that looking for existing modules and how to use them becomes easy. One such organisational structure is discussed later.

  • Physically Uniform Software: The Component

The need for physical design and heirarchy having been established this chapter deals with how components are organised.
A component is defined to be the atomic unit of physical design. In C++ it takes the form of a .h/.cpp pair. Associated with each component is a standalone test driver. Usually there is only one class per component. There are four basic types of classes: Basic Types such as Point, DateTime; Generic Containers such as Set,List,Vector; Complex abstractions such as Graph, Schema; Facades such as XmlParser, RegularExpression which puts a simpler face on a collection of other classes. Components are different from classes in that components are physical entities, a collection of files, that can be ported from one platform to another without code change. A platform is a combination of hardware and software used to build the component.

  • Quantifying Hierarchical Reuse: An Analogy

There are many ways to organise folders and files. There is no algorithm that can provide the optimal solution. This is partly because we don’t which component is more likely to be reused. While we can put a lot of effort in making a component accessible, the fact remains that such effort may be wasted. The subtleties involved will become clearer in later chapters. However two things are important: The software modules must have an acyclic dependency graph and the software must be able to grow while remaining stable as far as existing interfaces are concerned.

  • Software Capital

There is a huge advantage in first-to-market in the software industry. There are so many examples in consumer software. Microsoft understood this better than others. Hence the common wisdom was: Don’t upgrade until Version 3. Hence there is a lot of pressure to get it out rather than get it right. As Fred Brooks pointed out increasing the number of developers does not always speed up the project, because the more developers there are there is more time spent in communication and decision making. Again a large number of people leads to a lot of internal politics. Unless there is a strong push from those higher-up, developers will rewrite components rather than reuse existing components. Software should be treated as an asset that is depreciated rather than an expense that is consumed. Reusing components will give much credit to the component developer but not much to the user who often has to spend time understanding the code. Getting the balance right between reusing or rewriting is a mixture of hard and soft skills.

  • Growing the Investment

Economies of scale in software development come due to reuse of existing code. If every project is independent of every other project then small organisations can do the job just as well. A piece of software can be considered reusable only if it is actually reused. Both the producer and the consumer must tango. Too often the library writers throw it over the wall, so to speak. The consumer promptly throws it away. But then over-engineering a product to make it reusable is also not desirable. The compromise seems to be to refactor code in the first application before reusing in the second application. Again this is more art than science.

  • The Need for Vigilance

Like anything else as we add more stuff, we get closer to to chaos. The need is thus to ensure that some discipline is maintained. Of course we can either ensure that the discipline is maintained throughout the project or ensure that standards are adhered to when the project reaches completion. This is because a large amount of software is just thrown away because, the requirements have changed or new requirements come up that change the entire architecture or the requirements are reduced.

--

--

Joseph Mariadassou

Software developer with interest in Politics, Philosophy and Economics