Serialization / deserialization

Posted by NateDawg on Fri, 08 Oct 2021 05:00:19 +0200

CodeSheep.

A sheep who loves technology wants to make sharing a habit!

Tool man

Last time I didn't know which little partner left a message about the object   Serialization and deserialization   This one is a little mushy. Can you sort it out like a wave.

Coincidentally, I have the same intention.

After receiving this demand, I took the time to pick up the dusty Java programming idea again and re-examine the knowledge points of "serialization and deserialization".

Once upon a time, my understanding of Java serialization remained in the state of "implementing a serializable interface" until

What is serialization for?

The original intention of serialization is to "transform" a Java object into a byte sequence, so as to facilitate persistent storage to disk and prevent the object from disappearing from memory after the program runs. In addition, transforming it into a byte sequence is also more convenient for network transportation and propagation, so it is conceptually well understood:

  • Serialization: converts Java objects into byte sequences.

  • Deserialization: restores the byte sequence to the original Java object.

In a sense, the serialization mechanism also makes up for some differences in platform. After all, the converted byte stream can be deserialized on other platforms to recover objects.

The thing is just that. It looks very simple, but there are still a lot of things behind. Please look down.

How are objects serialized?

However, Java currently does not have a keyword to directly define a so-called "persistent" object.

The persistence and de persistence of objects need to rely on the programmer's manual and explicit serialization and de serialization restoration in the code.

For example, if we want to serialize Student class objects into a text file named student.txt, and then deserialize them into Student class objects through the text file:

1. Student class definition

public class Student implements Serializable {

    private String name;
    private Integer age;
    private Integer score;
    
    @Override
    public String toString() {
        return "Student:" + '\n' +
        "name = " + this.name + '\n' +
        "age = " + this.age + '\n' +
        "score = " + this.score + '\n'
        ;
    }
    
    // ...  Other omissions  ...
}

2. Serialization

public static void serialize(  ) throws IOException {

    Student student = new Student();
    student.setName("CodeSheep");
    student.setAge( 18 );
    student.setScore( 1000 );

    ObjectOutputStream objectOutputStream = 
        new ObjectOutputStream( new FileOutputStream( new File("student.txt") ) );
    objectOutputStream.writeObject( student );
    objectOutputStream.close();
    
    System.out.println("Serialization succeeded! Already generated student.txt file");
    System.out.println("==============================================");
}

3. Deserialization

public static void deserialize(  ) throws IOException, ClassNotFoundException {
    ObjectInputStream objectInputStream = 
        new ObjectInputStream( new FileInputStream( new File("student.txt") ) );
    Student student = (Student) objectInputStream.readObject();
    objectInputStream.close();
    
    System.out.println("The deserialization result is:");
    System.out.println( student );
}

4. Operation results

Console printing:

Serialization succeeded! Already generated student.txt file
==============================================
The deserialization result is:
Student:
name = CodeSheep
age = 18
score = 1000

What is the use of the Serializable interface?

When defining the Student class above, we implemented a Serializable interface. However, when we click inside the Serializable interface, we find that it is an empty interface and does not contain any methods!

Imagine what happens if you forget to add implements Serializable when defining the Student class above?

The experimental result is that the program will report an error and throw a NotSerializableException:

We followed the error prompt from the source code to the bottom of the writeObject0() method of ObjectOutputStream, and then we suddenly realized:

If an object is not a string, array or enumeration, and does not implement the Serializable interface, a NotSerializableException will be thrown during serialization!

Oh, I see!

The original Serializable interface is only used as a tag!!!

It tells the code that any class that implements the Serializable interface can be serialized! However, the real serialization action does not need to be completed by it.

What is the use of serialVersionUID number?

I'm sure you will often see the following code lines defined in some classes, that is, a field named serialVersionUID is defined:

private static final long serialVersionUID = -4392658638228508589L;

Do you know the meaning of this statement? Why do you want a serial number called serialVersionUID?

Let's continue to do a simple experiment. Take the Student class above as an example. We don't explicitly declare a serialVersionUID field in it.

First, we call the above serialize() method to serialize a Student object to the student.txt file on the local disk:

public static void serialize() throws IOException {

    Student student = new Student();
    student.setName("CodeSheep");
    student.setAge( 18 );
    student.setScore( 100 );

    ObjectOutputStream objectOutputStream = 
        new ObjectOutputStream( new FileOutputStream( new File("student.txt") ) );
    objectOutputStream.writeObject( student );
    objectOutputStream.close();
}

Next, let's do something in the Student class. For example, add a field named studentID to indicate the Student number:

At this time, we take the student.txt file that has just been serialized to the local, deserialize it with the following code, and try to restore the Student object just now:

public static void deserialize(  ) throws IOException, ClassNotFoundException {
    ObjectInputStream objectInputStream = 
        new ObjectInputStream( new FileInputStream( new File("student.txt") ) );
    Student student = (Student) objectInputStream.readObject();
    objectInputStream.close();
    
    System.out.println("The deserialization result is:");
    System.out.println( student );
}

The runtime found an error and threw an InvalidClassException:

The information prompted here is very clear: the serialVersionUID numbers before and after serialization are incompatible!

At least two important messages can be drawn from this place:

  • 1. serialVersionUID is a unique identifier before and after serialization

  • 2. By default, if no serialVersionUID has been explicitly defined, the compiler will automatically declare one for it!

Question 1:   serialVersionUID serialization ID can be regarded as a "code" in the process of serialization and deserialization. During deserialization, the JVM will compare the serial number ID in the byte stream with the serial number ID in the serialized class. Only when the two are consistent can they be deserialized again, otherwise an exception will be reported to terminate the deserialization process.

Question 2:   If no one explicitly defines a serialVersionUID when defining a serializable class, the Java runtime environment will automatically generate a default serialVersionUID for the class according to all aspects of the class information. Once the class structure or information is changed as above, the serialVersionUID of the class will also change!

Therefore, for the certainty of serialVersionUID, it is recommended to explicitly declare a serialVersionUID explicit value for all implements Serializable classes when writing code!

Of course, if you don't want to assign values manually, you can also use the automatic addition function of the IDE. For example, I use IntelliJ IDEA. Press alt + enter to automatically generate and add the serialVersionUID field for the class, which is very convenient:

Two special cases

  • 1. Fields modified by static will not be serialized

  • 2. Fields modified by the transient modifier will not be serialized

For the first point, because serialization saves the state of the object rather than the state of the class, it is natural to ignore the static field.

For the second point, we need to understand the role of the transient modifier.

If you do not want a field to be serialized when serializing an object of a class (for example, this field stores privacy values, such as passwords, etc.), you can modify the field with the transient modifier.

For example, if a password field is added to the previously defined Student class, but you do not want to serialize it to txt text, you can:

In this way, when serializing the Student class object, the password field will be set to the default value of null, which can be seen from the results of deserialization:

Controlled and enhanced serialization

Binding blessing

From the above process, we can see that there are loopholes in the process of serialization and deserialization, because there is an intermediate process from serialization to deserialization. If someone gets the intermediate byte stream and forges or tampers with it, the deserialized object will have a certain risk.

After all, deserialization is also equivalent to   "Implicit" object construction  , Therefore, we want to perform controlled object deserialization during deserialization.

How about a controlled method?

The answer is:   The readObject() function is written by ourselves for the deserialization construction of the object, so as to provide constraints.

Since you write the readObject() function yourself, you can do many controllable things: such as various judgment work.

Also take the Student class above as an example. Generally speaking, students' scores should be between 0 and 100. In order to prevent students' test scores from being tampered with into a wonderful value by others during deserialization, we can write our own readObject() function for deserialization control:

private void readObject( ObjectInputStream objectInputStream ) throws IOException, ClassNotFoundException {

    //  Call the default deserialization function
    objectInputStream.defaultReadObject();

    //  Manually check the validity of students' scores after deserialization. If any problem is found, terminate the operation!
    if( 0 > score || 100 < score ) {
        throw new IllegalArgumentException("Student scores can only be between 0 and 100!");
    }
}

For example, I deliberately change the student's score to 101. At this time, the deserialization is terminated immediately and an error is reported:

For the above code, some friends may wonder why the custom private readObject() method can be called automatically. This requires you to follow the underlying source code to explore. I helped you follow the bottom layer of ObjectStreamClass class. I'm sure you will suddenly understand:

It's the reflection mechanism at work again! Yes, in Java, sure enough, everything can be "reflected" (funny). Even the private methods defined in the class can be pulled out and executed, which is really comfortable.

Singleton mode enhancement

An easily overlooked problem is that serializable singleton classes may not be singleton!

A small code example is clear.

For example, here we first write a common singleton mode implementation of "static internal class" in java:

public class Singleton implements Serializable {

    private static final long serialVersionUID = -1576643344804979563L;

    private Singleton() {
    }

    private static class SingletonHolder {
        private static final Singleton singleton = new Singleton();
    }

    public static synchronized Singleton getSingleton() {
        return SingletonHolder.singleton;
    }
}

Then write a validation main function:

public class Test2 {

    public static void main(String[] args) throws IOException, ClassNotFoundException {

        ObjectOutputStream objectOutputStream =
                new ObjectOutputStream(
                    new FileOutputStream( new File("singleton.txt") )
                );
        //  Serialize the singleton object into the text file singleton.txt
        objectOutputStream.writeObject( Singleton.getSingleton() );
        objectOutputStream.close();

        ObjectInputStream objectInputStream =
                new ObjectInputStream(
                    new FileInputStream( new File("singleton.txt") )
                );
        //  Deserialize the object in the text file singleton.txt to singleton1
        Singleton singleton1 = (Singleton) objectInputStream.readObject();
        objectInputStream.close();

        Singleton singleton2 = Singleton.getSingleton();

        //  The running result actually prints false!
        System.out.println( singleton1 == singleton2 );
    }

}

After running, we found that the deserialized singleton object is not equal to the original singleton object, which undoubtedly does not achieve our goal.

The solution is to write the readResolve() function in the singleton class and directly return the singleton object to avoid it:

private Object readResolve() {
    return SingletonHolder.singleton;
}

In this way, when deserializing an object read from the stream, readResolve() is called to replace the newly deserialized object with the object returned from it.

Topics: Java