The Way to Clean Code [5] -- Objects and Data Structure

Posted by duall on Sat, 02 Oct 2021 00:10:13 +0200

To be frank, this chapter is not so well understood if you don't read it twice.

In fact, as long as you understand the two concepts of "object" and "data structure" mentioned by the author, you will understand them. Let me explain my understanding:

"Object": expose behavior (method), hide data (member private, no get/set).
"Data structure": Expose data (member public, or get/set) without explicit behavior (method).

Okay, get to the point:

There is a reason to make variables private: we don't want others to depend on them. But there are many programmers who automatically add get/set methods to objects to make private variables public, just as they are public variables at all.

1. Data abstraction

For example, here are two pieces of code that represent the data structure of a Point.

public class Point {
    public double x;
    public double y;
public interface Point {
  double getX();
  double getY();
  void setCartesian(double x, double y);
  double getR();
  double getTheta();
  void setPolar(double r, double theta);

The beauty of the second piece of code is that you don't know if the implementation will be in a matrix or polar coordinate system, or what other coordinate system it might be. However, the interface still clearly presents a Point data structure. The first piece of code requires us to manipulate the x, y coordinates directly, which actually exposes the internal structure of the Point.Even if the variable is set to private, because we also use the variable through get, set methods, its structure is still exposed.

Hidden implementations are not simply a function layer between variables (such as get/set methods). Hidden implementations are abstract. Classes do not simply push their variables out using get and set methods, but expose abstract interfaces so that users can manipulate the data body without knowing the implementation of the data.

As an example to illustrate the above paragraph, suppose you want the percentage of remaining fuel in a computer-driven car to have the following two sections of code:

public interface Vehicle {
  double getFuelTankCapacityInGallons();
  double getGallonsOfGasoline();
public interface Vehicle {
  double getPercentFuelRemaining();

The second one is better. The first one exposes the data structure of the fuel vehicle directly, and you can see which fields get the method. The second one uses the abstraction of percentage calculation, hiding the data structure of the vehicle and getting the percentage of remaining fuel directly.

While writing code, we prefer not to expose the details of the data, but to represent it in abstract form (such as the method above to get the percentage of remaining oil). Adding get/set methods without thinking is the worst option.

2. Antisymmetry of Data and Objects

Objects expose functions that manipulate data by hiding it behind abstraction. Data structures expose their data without providing meaningful functions. Examples:

The following code is an example of procedural code. The Geometry class operates on three shape classes. Shape classes are simple data structures without any behavior (methods). All behavior is in the Geometry class.

public class Square {
	public Point topLeft;
	public double side;

public class Rectangle {
	public Point topLeft;
	public double height;
	public double width;

public class Circle {
	public Point center;
	public double radius;

public class Geometry {

	public final double PI = 3.141592653589793;

	public double area(Object shape) throws NoSuchShapeException {
		if (shape instanceof Square) {
			Square s = (Square) shape;
			return s.side * s.side;

		} else if (shape instanceof Rectangle) {
			Rectangle r = (Rectangle) shape;
			return r.height * r.width;

		} else if (shape instanceof Circle) {
			Circle c = (Circle) shape;
			return PI * c.radius * c.radius;
		throw new NoSuchShapeException();

Think about what happens if you add a primeter() function to the Geometry class? Existing shape classes will not be affected at all. On the other hand, if you add a new shape, you will have to modify all the functions in the Geometry class to handle it!

Look at the solution for this object-oriented method below, where the area() method is polymorphic and does not require a Geometry class. So adding a class with a new shape will not affect one of the existing functions, and all the shapes will have to be modified when adding a new function.

public class Square implements Shape {

	private Point topLeft;

	private double side;

	public double area() {
		return side * side;



public class Rectangle implements Shape {

	private Point topLeft;

	private double height;

	private double width;

	public double area() {
		return height * width;



public class Circle implements Shape {

	private Point center;

	private double radius;

	public final double PI = 3.141592653589793;

	public double area() {
		return PI * radius * radius;


The principle of dichotomy between object and data: procedural code (code that uses data structures) makes it easy to add new functions without changing existing data structures, and object-oriented code makes it easy to add new classes without changing existing functions.

Conversely, it makes sense that procedural code cannot add new data structures because all functions must be modified, and object-oriented code cannot add new functions because all classes must be modified. So procedural code is easier for object-oriented things, and vice versa. This is the antisymmetry of data and objects.

In any complex system, you will need to add a new data type instead of a new function. Objects and object-oriented are appropriate at this time. On the other hand, there will be times when you want to add a new function instead of a data type. In this case, procedural code and data structures are more appropriate. You can choose what you want.

3. Dimitt's Law

The Law of Demeter, also known as the principle of minimum knowledge, is abbreviated as LoD. The law holds that the less a class knows about other classes, the better, that is, an object should know as little as possible about other objects, communicate only with friends, and not speak to strangers.

As mentioned in the previous section, objects hide data and expose operations. This also means that objects should not expose their internal structure through get/set methods, because it is more like exposing structure than hiding it.

For instance:

This code violates the Dimitter rule because it calls the getScratchDir() method of the getOptions() method return value and the getAbsolutePath() method of the getScratchDir() method return value.

final String outputDir = ctxt.getOptions().getScratchDir().getAbsolutePath();

The get method complicates this problem. If we take the form of the code below, there will be no violation of Dimitt's Law.

final String outputDir = ctxt.options.scratchDirs.absolutePath;

So here's a quick summary: if a data structure simply has public variables, no functions; if an object has private variables and public methods, it's easy to tell if it's Dimitt's rule or not, and the problem isn't easy to confuse.

Unfortunately, some of the code is objects and the other half is data structures. This will make it more difficult to add new functions as well as new data structures. So try to avoid code with this structure.

4. Data Transfer Object (DTO)

The most refined data structure is a class with only public variables and no functions. This data structure is sometimes called Data Transfer Object (DTO). DTO is very useful, especially in applications such as communication with databases, for converting raw data into data in databases.

Active Record s are a special form of DTO. They have data structures for common variables, but there are often methods like save and find.Record is a domain model pattern characterized by a model class corresponding to a table in a relational database and an instance of a model class corresponding to a row of records in the table. Such data structures should not be crammed with business methods, or they can lead to confounding data structures and objects, causing the previously mentioned two-sided problem.

Reference resources
The Way to Clean Code

Topics: Code Style