An interesting memory leak case

Posted by Ironphp on Mon, 01 Nov 2021 02:36:01 +0100

I said in this article before that I did an SSR   On how to directly output front-end pages with 14W lines of code at pixel level , I thought today was smooth and happy.

Unexpectedly, after the project is put online, with the increase of requests, I feel that the first screen speed is getting slower and slower, and it is continuously slowing down. Moreover, after the release (that is, the container is rebuilt), the time consumption drops sharply.

Therefore, it is reasonable to suspect that there is a memory leak. Therefore, take a look at the STKE monitoring panel. The memory is really like waves.

1. Recurrence

Knowing that it is a memory leak, we need to find the leak point. Because the online environment cannot be easily operated, and the online code is compressed, we need to build a local environment to see whether it is convenient for debugging. Here, we can write a script to initiate a request after starting the Server locally to simulate the online environment. (as everyone who has read the previous article knows, we also have a skeleton screen mode, which can skip the steps of initiating CGI requests and greatly reduce the time-consuming of a single request, so that the results can be displayed in a few seconds.)

We can use   heapdump   Package to write stack information to a local file. heapdump   The basic use posture is as follows:

const heapdump = require('heapdump');



heapdump.writeSnapshot('./test.heapsnapshot');


Then you can import the stack file into the Chrome developer tool   Memory   Column to analyze. Here, I choose to write 101 times of stack information after running once, 50 times, 100 times and waiting for garbage collection for a few seconds. You can see that the stack file becomes larger and larger, from 35M to 249M.

Select two stack files for comparison and analysis. A trick here is to sort by memory size, and then see that there are many objects of the same size, so it is likely that it has been referenced many times, and the leakage point may be there. Then I found out that the problem may be   console   Object.

2. Analyze problems

Use normally   console   Object does not cause a memory leak, so it is doubtful whether it is right   console   What did you do. After searching the code, excluding the normal call, it is found that there is an assignment operation, which is similar to the following code:

const nativeError = console.error;



console.error = (...argv) => {

    // Omit some operations

    nativeError(...argv);

};


This code is actually quite common in front-end development. For example, the time needs to be automatically added to the log:

const nativeError = console.error;



console.error = (...argv) => {

    nativeError(`[${(new Date()).toTimeString()}]`, ...argv);

};



console.error('Test');

// [20:58:17 GMT+0800 (China standard time)] Test


Another more common scenario is that we need to mask most of the log output in the production environment, but keep a log function reference to output some key information on the browser terminal. At this time, it will be written as follows:

// Reference is used to report when necessary

const logger = console.log;



// It is necessary to assign values with functions, so that there will be no errors in a lot of places where console.log('...') is used

console.log = () => {};



logger('Browser terminal AlloyTeam recruitment information');


But in our environment, the original client code is compiled and run repeatedly in the vm. What problems will this bring?

A code is attached here. Interested partners can run:

const vm = require('vm');

const heapdump = require('heapdump');



const total = 5000;



const writeSnapshot = (count) => {

    heapdump.writeSnapshot(`./${count}-${total}.heapsnapshot`);

};



const code = `

    const nativeError = console.error;



    console.error = (...argv) => {

        nativeError(argv);

    }

`;



const script = new vm.Script(code);



for (let i = 1; i <= total; i++) {

    script.runInNewContext({

        console,

    });



    console.log(`${i}/${total}`);



    switch (i) {

        case 1:

        case Math.floor(total * 0.5):

        case total:

            writeSnapshot(i);

    }

}



setTimeout(() => {

    writeSnapshot(total + 1);

}, 3000);

A small piece of code takes up more than 1G of memory after running 5000 times, and there is no sign of recycling.

Let's first consider the differences in the vm environment:

  1. There is no console object in the vm. The console object in the vm is passed in by the host environment. Modifications to the console in the vm will also be reflected in the console object of the host environment;
  2. When the same piece of code is executed multiple times, it means that these execution environments share the console object. In the browser environment, after refreshing the page, the code is executed multiple times, and the environment is independent;

Then our problem will appear, as shown in the figure above:

  1. On the host environment, console.error   The original point is the native error method;
  2. When vm executes for the first time (assuming that the function to be assigned in this process is Func1), it first references   console.error  , That is, the native error method is referenced, and the error on the host environment is transferred through the assignment operation   console.error   Points to Func1;
  3. When the vm executes the second time, it is also referenced first   console.error   Method, but it refers to Func1 set in step 2, that is, Func2 refers to Func1. At the same time, it will   console.error   Set to Func2;
  4. Similarly, Func3 refers to Func2, and   console.error   Points to Func3;

So the smart guys found no problem, which became a chain reference. None of the objects on this chain want to be recycled. They are firmly tied.

If we want to solve this problem, what should be the ideal reference model?

An ideal reference model should be that no matter how many times the vm code is executed, we should do the following in our value and assignment operations:

  1. The original error method is always used in the value taking operation, because if the last running assignment method is obtained, there will be a reference relationship;
  2. The assignment operation cannot operate on the console object of the host environment, because it will affect the global console object in other batch VMS;
  3. The value taking operation after the assignment operation needs to get the method after the assignment, so as to execute the user-defined logic;

In fact, this requires us to isolate not only the context of the vm, but also the reference objects belonging to the host environment passed by the context created by the vm.

3. Problem solving

Is there any simple solution? Assuming that we clearly understand the code execution environment (multiple executions and shared host objects), we only need to make a flag bit to prevent multiple executions:

const nativeError = console.error;



if (!nativeError.hasBeenRewrite) {

    console.error = (...argv) => {

        nativeError(argv);

    };

    console.error.hasBeenRewrite = true;

}

But in the original code running on the client, it will be written like this. I feel that I have either encountered this problem or can only say excellent. I have this consciousness from the beginning!

So when we want to build a basic runtime, can we do that we don't need business to care about such detailed problems? That is, can we isolate the context of the object from the context of the context? There are several conditions that support us to do so:

  1. The reference objects we pass to the vm that belong to the host environment are actually very limited, so we can isolate these limited objects;
  2. The objects we need to isolate follow the context created by the vm;

Then, back to the ideal model mentioned above, attach the code here, and then interpret the whole scheme:

const vm = require('vm');

const heapdump = require('heapdump');



const total = 5000;



const writeSnapshot = (count) => {

    heapdump.writeSnapshot(`./${count}-${total}.heapsnapshot`);

};



const code = `

    const nativeError = console.error;



    console.error = (...argv) => {

        nativeError(...argv);

    }

`;



const script = new vm.Script(code);



const vmProxy = (context, obj, name) => {

    const proxyStore = {};



    const proxyObj = new Proxy(obj, {

        get: function (target, propKey) {

            if (proxyStore[name] && proxyStore[name][propKey]) {

                return proxyStore[name][propKey];

            }



            return target[propKey];

        },

        set: function (target, propKey, value) {

            if (!proxyStore[name]) {

                proxyStore[name] = {};

            }



            const defineObj = proxyStore[name];

            if ((typeof value === 'function' || typeof value === 'object') && value !== null) {

                defineObj[propKey] = value;

            }

        },

    });



    context[name] = proxyObj;

    context.proxyStore = proxyStore;

    return context;

};



for (let i = 1; i <= total; i++) {

    const context = vmProxy({}, console, 'console');



    script.runInNewContext(context);



    console.log(`${i}/${total}`);



    switch (i) {

        case 1:

        case Math.floor(total * 0.5):

        case total:

            writeSnapshot(i);

    }

}



setTimeout(() => {

    writeSnapshot(total + 1);

}, 3000);

Here are some key points:

  1. use   Proxy   Method to intercept the attribute get operation of the console;
  2. We will set on the vm context object   proxyStore   Object is used to store the value set by the set operation   proxyStore   Will be recycled along with the recycling of the context;
  3. The set operation on the console will not be set on the console and affect the reference object of the host environment, but it needs to be stored;

Step by step:

  1. yes   console.error   To determine whether the ProxyStore has been set by the current environment. If not, we will return the original error method to the value operation;

  1. yes   console.error   For the operation of assigning Func1, we judge that there is no assignment to this attribute stored in ProxyStore, so store Func1 in ProxyStore. Note that we cannot set Func1 to   console.error   On;

  1. In subsequent calls   console.error   Operation, we will intercept the get method again. We judge that func1 has been assigned in the ProxyStore. At this time, return to func1 and call   console.error   It becomes a call   Func1  ;

Through the above operations, we maintained   console.error   Always point to the native error method. Each reference is also the referenced native error method, not the last set method.

Then we solved the problem of memory leakage:

4. Problem avoidance

I seem to appreciate myself for solving this problem in such a clever way.

But let's think again   Proxy   What problems will it bring? Will there be performance problems?

Practice leads to true knowledge. We compare the performance differences of the above two solutions:

const vm = require('vm');



const total = 10000;



const vmProxy = (context, obj, name) => {

    const proxyStore = {};



    const proxyObj = new Proxy(obj, {

        get: function (target, propKey) {

            if (proxyStore[name] && proxyStore[name][propKey]) {

                return proxyStore[name][propKey];

            }



            return target[propKey];

        },

        set: function (target, propKey, value) {

            if (!proxyStore[name]) {

                proxyStore[name] = {};

            }



            const defineObj = proxyStore[name];

            if ((typeof value === 'function' || typeof value === 'object') && value !== null) {

                defineObj[propKey] = value;

            }

        },

    });



    context[name] = proxyObj;

    context.proxyStore = proxyStore;

    return context;

};



(() => {

    const code = `

        const nativeError = console.error;



        console.error = (...argv) => {

            nativeError(...argv);

        }

    `;



    const script = new vm.Script(code);



    console.time('proxy');

    for (let i = 1; i <= total; i++) {

        const context = vmProxy({}, console, 'console');



        script.runInNewContext(context);

    }

    console.timeEnd('proxy');

})();



(() => {

    let code = `

        const nativeError = console.error;



        if (!nativeError.hasBeenRewrite) {

            console.error = (...argv) => {

                nativeError(argv);

            };

            console.error.hasBeenRewrite = true;

        }

    `;



    let script = new vm.Script(code);



    console.time('flag');

    for (let i = 1; i <= total; i++) {

        script.runInNewContext({

            console,

        });

    }

    console.timeEnd('flag');

})();

There seems to be little performance difference

however   Proxy   There's one   this   Point to the problem because   Proxy   It's not a transparent proxy, it's   Proxy   The this point inside the proxy object will point to the proxy instance, so it's ok if it's such a simple example, but it's still careless to proxy complex objects online. (also consider the objects in the object)

Is it possible to find a similar memory leak in the development phase instead of waiting until the release line?

Of course, I said it only when I thought of a way. I thought about this problem all afternoon before. It was too complicated, so I tried many ways and didn't think of it. Let's clarify first because the stored function is called in the function to be assigned   nativeError   Are you? It's actually irrelevant, even if you will   nativeError(...argv)   Note that there will still be a memory leak.

const nativeError = console.error;



console.error = (...argv) => {

    nativeError(...argv);

}

The reason here is that as long as the same key of the reference object of the host environment in the same vm is done at the same time   get   and   set   Operation, there will be a memory leak. Let's consider whether there will be memory leakage in the following three cases:

Same key:

const nativeError = console.undefined;



console.undefined = (...argv) => {

    nativeError(argv);

}

Different key s:

const nativeError = console.undefined;



console.notExist = (...argv) => {

    nativeError(argv);

}

Set is not a reference object:

const nativeError = console.error;



console.error = 'AlloyTeam';

The answer is that the first will have a memory leak, and the second and third will not. Curious friends can run with the example code above.

Let's simplify this problem and look at the detection scheme. As usual, we start with the code:

const { workerData, Worker, isMainThread } = require('worker_threads');

const vm = require('vm');

const log = console.log;



const memoryCheckStore = {};



const isReferenced = value => !!(value && typeof value === 'object' || typeof value === 'function');



const vmProxy = (context, obj, name) => {

    const proxyObj = new Proxy(obj, {

        get: function (target, propKey) {

            const propValue = target[propKey];



            if (!memoryCheckStore[obj]) {

                memoryCheckStore[obj] = {};

            }

            // todo: you need to handle arrays and iterated child objects

            if (!memoryCheckStore[obj][propKey]) {

                memoryCheckStore[obj][propKey] = 1;

            }



            return propValue;

        },

        set: function (target, propKey, value) {

            if (isReferenced(value) && memoryCheckStore[obj][propKey]) {

                log(new Error('[warning] There may be a memory leak'));

            }



            target[propKey] = value;

        },

    });



    context[name] = proxyObj;

    return context;

};



const code1 = `

    const nativeError = console.undefined;



    // leak

    console.undefined = (...argv) => {}

`;



const code2 = `

    const nativeError = console.undefined;



    // No leakage

    console.notExist = (...argv) => {}

`;



const code3 = `

    const nativeError = console.undefined;



    // No leakage

    console.error = 'AlloyTeam';

`;



const code4 = `

    const nativeError = console.error;



    // leak

    console.error = (...argv) => {}

`;



if (isMainThread) {

    for (let i = 1; i <= 4; i++) {

        new Worker(__filename, {

            workerData: {

                code: eval(`code${i}`),

                flag: i,

            },

        });

    }

} else {

    const { code, flag } = workerData;



    const script = new vm.Script(code, {

        filename: `code${flag}`,

    });



    const context = vmProxy({}, console, 'console');



    script.runInNewContext(context);

}

It is known that code1 and code4 may have memory leaks after only one run:

Scheme diagram 1, get stage:

  1. in limine   console.error   Point to the native error method;
  2. We set a GlobalGetStore object globally to record the referenced object and the referenced property name;
  3. In the first run, if the intercepted get method judges that there is no such object in the store, it records the object to the store and the referenced key value;

Scheme diagram 2, set stage:

  1. In the intercepted set method, it is determined that the referenced object has been stored in the store, and the key value of the current operation has been referenced. Therefore, it is determined that there may be memory leakage in the environment of multiple executions such as vm, and the alarm information is printed;

In this way, we can deploy the memory detection code in the development phase (the demo code still needs to deal with the case that the array and object properties are reference types) and remove or invalidate it in the production environment.

Of course, for an excellent project, there are still two related things to do before and after it goes online:

  1. Automatic test: initiate multiple user requests through simulation, detect memory changes, and detect possible memory leakage before going online;
  2. Set the alarm strategy to alarm when the memory exceeds the limit, check the memory changes and confirm whether there is leakage;

5. Postscript

It's actually quite interesting to encounter such a problem. Although it's a small point, it combs a relatively complete thinking process, hoping to bring reference and ideas to the partners to solve relevant problems.

Topics: Front-end