Detailed explanation of serialization and deserialization of java objects

Posted by swasheck on Sun, 16 Jan 2022 18:23:45 +0100

What does serialization and deserialization of java objects mean

1. What is serialization for?
The original intention of serialization is to "transform" a Java object into a byte sequence, so as to facilitate persistent storage to disk and prevent the object from disappearing from memory after the program runs. In addition, transforming it into a byte sequence is also more convenient for network transportation and propagation, so it is conceptually well understood:

Serialization: converts Java objects into byte sequences.
Deserialization: restores the byte sequence to the original Java object.
In a sense, the serialization mechanism also makes up for some differences in platform. After all, the converted byte stream can be deserialized on other platforms to recover objects.

2. How to serialize objects?
In Java, if an object wants to realize serialization, it must implement one of the following two interfaces:

Serializable interface
Externalizable interface

How do these two interfaces work? What is the relationship between the two? We will introduce them separately.

2.1 Serializable interface
If an object wants to be serialized, its class must implement this interface or its sub interface.

All properties of this object (including private properties and objects referenced by it) can be serialized and deserialized to save and pass. Fields that do not want to be serialized can be decorated with transient.

Since the Serializable object is completely constructed based on its stored binary bits, it does not call any constructor, so the Serializable class does not need a default constructor. However, when the parent class of the Serializable class does not implement the Serializable interface, the deserialization process will call the default constructor of the parent class, so the parent class must have a default constructor, Otherwise, an exception will be thrown.

Using the transient keyword to prevent serialization is simple and convenient, but the attributes modified by it are completely isolated from the serialization mechanism, resulting in the inability to obtain the value of the attribute during deserialization. By adding the writeObject() method and readObject() method to the Java class of the object to be serialized, you can control how to serialize each attribute, Some properties are not even serialized at all or encrypted.

2.2 Externalizable interface
It is a subclass of the Serializable interface. The writeExternal() and readExternal() methods to be implemented by the user are used to determine how to serialize and deserialize.

Because the serialization and deserialization methods need to be implemented by themselves, you can specify which attributes to serialize, and transient is invalid here.

When deserializing an Externalizable object, the parameterless constructor of the class will be called first, which is different from the default deserialization method. If you delete the constructor without parameters of the class, or set the access permission of the constructor to the private, default or protected level, Java. Net will be thrown io. Invalidexception: no valid constructor exception. Therefore, an Externalizable object must have a default constructor and must be public.

2.3 comparison
When using, you only want to hide one attribute, such as the password pwd of the user object user. If you use Externalizable and write every attribute except pwd in the writeExternal() method, it will be troublesome. You can use the Serializable interface and add transient in front of the attribute pwd to be hidden. If you want to define a lot of special processing, you can use Externalizable.

Of course, we have some doubts here. The writeObject() method and readObject() method in Serializable can realize custom serialization, and the writeExternal() and readExternal() methods in Externalizable can also. What are the similarities and differences between them?

There are two methods, readExternal(),writeExternal(). Except that the method signature of these two methods is different from that of readObject(),writeObject(), their method bodies are exactly the same.
It should be noted that when using the Externalizable mechanism to deserialize the object, the program will use the public parameterless constructor to create an instance, and then execute the readExternal() method for deserialization. Therefore, the serialization class implementing Externalizable must provide the public parameterless constructor.
Although the implementation of externalizable interface can bring some performance improvement, the implementation of externalizable interface leads to the increase of programming complexity, so the Serializable interface is used to realize serialization most of the time.

3. How does Serializable serialize objects?

3.1 Serializable demo
However, Java currently does not have a keyword to directly define a so-called "persistent" object.

The persistence and de persistence of objects need to rely on the programmer's manual and explicit serialization and de serialization restoration in the code.

For example, suppose we want to serialize the Student class object to a class named Student Txt, and then deserialize it into a Student class object through the text file:

1. Student class definition

public class Student implements Serializable {

    private String name;
    private Integer age;
    private Integer score;
 
    @Override
    public String toString() {
        return "Student:" + '\n' +
        "name = " + this.name + '\n' +
        "age = " + this.age + '\n' +
        "score = " + this.score + '\n'
        ;
    }
 
    // ...  Other omissions
}

2. Serialization

public static void serialize(  ) throws IOException {

    Student student = new Student();
    student.setName("CodeSheep");
    student.setAge( 18 );
    student.setScore( 1000 );

    ObjectOutputStream objectOutputStream = 
        new ObjectOutputStream( new FileOutputStream( new File("student.txt") ) );
    objectOutputStream.writeObject( student );
    objectOutputStream.close();
 
    System.out.println("Serialization succeeded! Already generated student.txt file");
    System.out.println("==============================================");
}

3. Deserialization

public static void deserialize(  ) throws IOException, ClassNotFoundException {
    ObjectInputStream objectInputStream = 
        new ObjectInputStream( new FileInputStream( new File("student.txt") ) );
    Student student = (Student) objectInputStream.readObject();
    objectInputStream.close();
 
    System.out.println("The deserialization result is:");
    System.out.println( student );
}

4. Operation results

Console printing:

Serialization succeeded! Student. Has been generated Txt file
==============================================

The deserialization result is:
Student:
name = CodeSheep
age = 18
score = 1000

3.2 what is the use of serializable interface?

When defining the Student class above, we implemented a Serializable interface. However, when we click inside the Serializable interface, we find that it is an empty interface and does not contain any methods!

Imagine what happens if you forget to add implements Serializable when defining the Student class above?

The experimental result is that the program will report an error and throw a NotSerializableException:

What does serialization and deserialization of java objects mean

We followed the error prompt from the source code to the bottom of the writeObject0() method of ObjectOutputStream, and then we suddenly realized:

What does serialization and deserialization of java objects mean

If an object is not a string, array or enumeration, and does not implement the Serializable interface, a NotSerializableException will be thrown during serialization!

**The original Serializable interface is only used as a tag** It tells the code that any class that implements the Serializable interface can be serialized! However, the real serialization action does not need to be completed by it.

3.3 what is the use of serialVersionUID number?

I'm sure you will often see the following code lines defined in some classes, that is, a field named serialVersionUID is defined:

private static final long serialVersionUID = -4392658638228508589L;

Do you know the meaning of this statement? Why do you want a serial number called serialVersionUID?

Let's continue to do a simple experiment. Take the Student class above as an example. We don't explicitly declare a serialVersionUID field in it.

We first call the above serialize() method to serialize a Student object to the Student on the local disk Txt file:

Next, let's do something in the Student class. For example, add a field named id to indicate the Student id:

public class Student implements Serializable {
    private String name;
    private Integer age;
    private Integer score;
    private Integer id;

At this time, we take the Student that has just been serialized to the local Txt file, and deserialize it with the following code to try to restore the Student object just now:

The runtime found an error and threw an InvalidClassException exception

What does serialization and deserialization of java objects mean

The information prompted here is very clear: the serialVersionUID numbers before and after serialization are incompatible!

At least two important messages can be drawn from this place:

1. serialVersionUID is a unique identifier before and after serialization

2. By default, if no serialVersionUID has been explicitly defined, the compiler will automatically declare one for it!

Question 1: serialVersionUID serialization ID can be regarded as a "secret code" in the process of serialization and deserialization. During deserialization, the JVM will compare the serial number ID in the byte stream with the serial number ID in the serialized class. Only when the two are consistent can it be reordered, otherwise an exception will be reported to terminate the deserialization process.

Question 2: if no one explicitly defines a serialVersionUID when defining a serializable class, the Java runtime environment will automatically generate a default serialVersionUID for the class according to all aspects of the class information. Once the class structure or information is changed as above, the serialVersionUID of the class will also change!

Therefore, for the certainty of serialVersionUID, it is recommended to explicitly declare a serialVersionUID explicit value for all implements Serializable classes when writing code!

Of course, if you don't want to assign values manually, you can also use the automatic addition function of the IDE. For example, I use IntelliJ IDEA. Press alt + enter to automatically generate and add the serialVersionUID field for the class, which is very convenient:

Two special cases
1. Fields modified by static will not be serialized

2. Fields modified by the transient modifier will not be serialized

For the first point, because serialization saves the state of the object rather than the state of the class, it is natural to ignore the static field.

For the second point, we need to understand the role of the transient modifier.

If you do not want a field to be serialized when serializing an object of a class (for example, this field stores privacy values, such as passwords, etc.), you can modify the field with the transient modifier.

For example, if a password field is added to the previously defined Student class, but you do not want to serialize it to txt text, you can:

public class Student implements Serializable {
    private static final long serialVersionUID = -4392658638228508589L;
    private transient String name;
    private Integer age;
    private Integer score;
    private transient String passwd;

In this way, when serializing the Student class object, the password field will be set to the default value of null, which can be seen from the results of deserialization:

public static void serialize() throws IOException {

    Student student = new Student();
    student.setName("CodeSheep");
    student.setAge(18);
    student.setScore(1000);
    student.setPasswd("123");

4. Implement Externalizable

public UserInfo() {
    userAge=20;//This is used in the second test to determine whether the deserialization passes through the constructor
}
public void writeExternal(ObjectOutput out) throws IOException  {
    //  Specifies the property to write when serializing. Age is still not written here
    out.writeObject(userName);
    out.writeObject(usePass);
}
public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException  {
    // Specifies the order in which attributes are read during deserialization and the attributes to be read
    // If you write the reverse order of property reading, you can find the specified value of the deserialized read object
   // The attribute value will also correspond to the reading method you write. Because loading objects in files is orderly
    userName=(String) in.readObject();
    usePass=(String) in.readObject();
}

When serializing objects, this class implements the Externalizable interface and defines which attributes can be serialized and which cannot be serialized in the writeExternal() method. Therefore, the serialization of objects that can be serialized will be saved in the file, and those that cannot be serialized will not be processed, Then, when the sequence is reversed, the readExternal() method is automatically called, the sequence is read one by one according to the sequence sequence, and automatically encapsulated into an object return, and then received in the test class to complete the reverse sequence.

The only feature of the Externalizable instance class is that it can be written to the serialized stream. This class is responsible for saving and restoring the instance content. If a wants to fully control the stream format and content of an object and its supertype, it needs to implement the writeExternal and readExternal methods of the Externalizable interface. These methods must explicitly coordinate with supertypes to preserve their state. These methods will replace the customized writeObject and readObject methods.

writeExternal(ObjectOutput out)
This object can implement the writeExternal method to save its contents. It can save its basic values by calling the DataOutput method, or call the writeObject method of ObjectOutput to save objects, strings and arrays.
readExternal(ObjectInput in)
Object implements the readExternal method to recover its content. It calls the DataInput method to recover its basic type, and calls readObject to recover objects, strings and arrays.

The difference between externalizable and Serializable:

1. The serializable interface is implemented to serialize all attributes by default. If there are attributes that do not need to be serialized, use the transient modifier. The externalizable interface is a subclass of serializable. To implement this interface, you need to override the writeExternal and readExternal methods to specify the properties of object serialization and the behavior of reading object properties from the serialization file.

2. The object serialization file that implements the serializable interface is deserialized without the construction method. What is loaded is a persistent state of the class object, and then this state is assigned to another variable of the class. The object serialization file of the externalizable interface is implemented to reverse serialize the first construction method to get the controlled object, then the readExternal method is used to read the contents in the serialized file to assign the corresponding attribute.

5. Controlled and enhanced serialization
5.1 binding blessing
From the above process, we can see that there are loopholes in the process of serialization and deserialization, because there is an intermediate process from serialization to deserialization. If someone gets the intermediate byte stream and forges or tampers with it, the deserialized object will have a certain risk.

After all, deserialization is also equivalent to an "implicit" object construction, so we want to perform controlled object deserialization during deserialization.

How about a controlled method?

The answer is: write your own readObject() function for the deserialization construction of objects to provide constraints.

Since you write the readObject() function yourself, you can do many controllable things: such as various judgment work.

Also take the Student class above as an example. Generally speaking, students' scores should be between 0 and 100. In order to prevent students' test scores from being tampered with into a wonderful value by others during deserialization, we can write our own readObject() function for deserialization control:

private void readObject( ObjectInputStream objectInputStream ) throws IOException, ClassNotFoundException {

    // Call the default deserialization function
    objectInputStream.defaultReadObject();

    // Manually check the validity of students' scores after deserialization. If any problem is found, terminate the operation!
    if( 0 > score || 100 < score ) {
        throw new IllegalArgumentException("Student scores can only be between 0 and 100!");
    }
}

For example, I deliberately change the student's score to 101. At this time, the deserialization is terminated immediately and an error is reported:

For the above code, why can the custom private readObject() method be called automatically? Follow the underlying source code to explore. Unexpectedly, the reflection mechanism is working at the bottom of the ObjectStreamClass class! Yes, in Java, sure enough, everything can be "reflected" (funny). Even the private methods defined in the class can be pulled out and executed, which is really comfortable.

5.2 single case mode enhancement
An easily overlooked problem is that serializable singleton classes may not be singleton!

A small code example is clear.

For example, here we first write a common singleton mode implementation of "static internal class" in java:

public class Singleton implements Serializable {

    private static final long serialVersionUID = -1576643344804979563L;

    private Singleton() {
    }

    private static class SingletonHolder {
        private static final Singleton singleton = new Singleton();
    }

    public static synchronized Singleton getSingleton() {
        return SingletonHolder.singleton;
    }
}

Then write a validation main function:

public class Test2 {

    public static void main(String[] args) throws IOException, ClassNotFoundException {

        ObjectOutputStream objectOutputStream =
                new ObjectOutputStream(
                    new FileOutputStream( new File("singleton.txt") )
                );
        // Serialize the singleton object to the text file singleton Txt
        objectOutputStream.writeObject( Singleton.getSingleton() );
        objectOutputStream.close();

        ObjectInputStream objectInputStream =
                new ObjectInputStream(
                    new FileInputStream( new File("singleton.txt") )
                );
        // The text file singleton The object in txt is deserialized to singleton1
        Singleton singleton1 = (Singleton) objectInputStream.readObject();
        objectInputStream.close();

        Singleton singleton2 = Singleton.getSingleton();

        // The running result actually prints false!
        System.out.println( singleton1 == singleton2 );
    }

}

After running, we found that the deserialized singleton object is not equal to the original singleton object, which undoubtedly does not achieve our goal.

The solution is: write the readResolve() function in the singleton class and directly return the singleton object:

private Object readResolve() {
    return SingletonHolder.singleton;
}
package serialize.test;

import java.io.Serializable;

public class Singleton implements Serializable {

    private static final long serialVersionUID = -1576643344804979563L;

    private Singleton() {
    }

    private static class SingletonHolder {
        private static final Singleton singleton = new Singleton();
    }

    public static synchronized Singleton getSingleton() {
        return SingletonHolder.singleton;
    }
 
    private Object readResolve() {
        return SingletonHolder.singleton;
    }
}

In this way, when deserializing an object read from the stream, readResolve() is called to replace the newly deserialized object with the object returned from it.

Topics: Java