First experience of front-end recording and playback

Posted by simongcc on Sat, 19 Feb 2022 03:34:45 +0100

1. Problem background

1.1 what is front-end recording and playback?

As the name suggests, it is to record various operations of users in the web page, and support playback operations at any time.

1.2 why?

When it comes to needs, we have to say a classic scenario. Generally, the front end does exception monitoring and error reporting. It will collect and report JavaScript error reporting information and other relevant data in the process of website interaction in the form of self-development or access to a third-party SDK, that is, embedding points.

In the traditional buried point scheme, the specific error reporting code file and row and column information can be located according to SourceMap. It can basically locate most of the scene problems, but in some cases it is difficult to reproduce the errors. Most of them are one of the programmer's mantras during the test wrangling (I didn't report an error here, is there a problem with your computer).

If only we could record the wrong operation process, it would be convenient for us to reproduce the scene and retain the evidence, as if we had dug a hole for ourselves.

Add: I am a reprint. Our demand is to replay the whole insurance process of users (our company is an insurance company)

1.3 how to achieve?

Can the front end record video? My first reaction was to question, and then I was a wave of Google and found that there was a feasible solution.

Before Google, I thought of setting a timer to take a screenshot of the view window. The screenshot can be realized by canvas 2html, but this method will undoubtedly cause performance problems and will be rejected immediately.

Here's what I "know" about Google's solution. If you have any questions, please correct them.

2. Initial ideas

Web page is essentially a DOM node, which is rendered by browser. Whether we can save the DOM in some way and continuously record the DOM data state at different time nodes. Then restore the data to DOM node and render it to complete playback?

2.1 operation record

Through document documentElement. CloneNode () is a data object cloned into DOM. At this time, this data cannot be directly transmitted to the back end through the interface. Some formatting preprocessing needs to be carried out to process it into a data format convenient for transmission and storage. The simplest way is to serialize, that is, convert to JSON data format.

let docJSON = {
  "type": "Document",
  "childNodes": [
    {
      "type": "Element",
      "tagName": "html",
      "attributes": {},
      "childNodes": [
        {
          "type": "Element",
          "tagName": "head",
          "attributes": {},
          "childNodes": []
        }
      ]
    }
  ]
}

After having complete DOM data, you also need to listen when the DOM changes and record the DOM node information of each change. To monitor data, you can use MutationObserver, which is an API that can monitor DOM changes.

const observer = new MutationObserver(mutationsList => {
    console.log(mutationsList); // Changed data
});
// Start observing the target node with the above configuration
observer.observe(document, {});

In addition to monitoring DOM changes, there is also event monitoring. Users interact with web pages mostly through input devices such as mouse and keyboard. Behind these interactions is JavaScript event monitoring. Event monitoring can be completed by binding system events, which also need to be recorded. Take mouse movement as an example:

// Mouse movement
document.addEventListener('mousemove', e => {
  // The pseudo code obtains the information of mouse movement and records it
  positions.push({
    x: clientX,
    y: clientY,
    timeOffset: Date.now() - timeBaseline,
  });
});

Note: addEventListener can bind multiple same events without affecting the developer's event binding

2.2 playback operation

The data already exists, and then there is playback. Playback essentially restores JSON data into DOM nodes and renders them. It's not so easy to restore data!

2.3 rendering environment

First of all, in order to ensure code isolation during playback, a sandbox environment is required, which can be achieved by iframe tag, and iframe provides a sandbox attribute to configure sandbox. The function of sandbox environment is to ensure that the code is safe and undisturbed.

<iframe sandbox srcdoc></iframe>

Sandbox attribute can be used as a sandbox. Click to view the document

srcdoc can be set directly as a piece of html code

2.4 data restore

Snapshot reorganization is mainly the reorganization of DOM nodes, which is a bit like the process of transforming virtual DOM into real document nodes, but event type snapshots do not need reorganization.

2.5 timer

With data and environment, timers are also needed. Rendering DOM continuously through the timer is essentially an effect of playing video, and requestAnimationFrame is the most appropriate.

requestAnimationFrame will be executed in each frame to avoid congestion, which is a more appropriate choice than setTimeout

So far, I have a general idea that it is still a distance from landing. Thanks to open source, we can go to Github to see if there is a suitable wheel to copy (learn from), and there is just a ready-made framework rrweb might as well have a look together.

3. rrweb framework

rrweb is a front-end recording and playback framework. Its full name is record and replay the web. As its name suggests, it can record and replay the operations in the web interface. Its core principle is the scheme introduced above.

3.1 composition of rrweb

rrweb consists of three parts:

  • Rrweb snapshot mainly deals with the serialization and reorganization of DOM structure;
  • The main functions of rrweb are recording and playback;
  • Rrweb player a video player UI space
3.2 rrweb usage

npm installation is common, and import/require is not a big problem

3.2.1 recording

Via rrweb Record method to record the page, and the emit callback can accept the recorded data.

// 1. Recording
let events = []; // Record snapshot

rrweb.record({
  emit(event) {
    // Store event into the events array
    events.push(event);
  },
});
3.2.2 playback

Via rrweb Replay can play back the video and need to transfer the recorded data.

// 2. Playback
const replayer = new rrweb.Replayer(events);
replayer.play();

4. rrweb source code

According to the above ideas, I will analyze some of the key codes. Of course, it's just some analysis based on my personal understanding. In fact, rrweb source code is far more than that.

The core part consists of three blocks: record, replay and snapshot.

4.1 Record recording

After the DOM is loaded, record will do a complete DOM serialization. We call it full snapshot, which records the whole HTML data structure.

In record Find the definition init of the key entry function in TS. the entry function calls takeFullSnapshot and observe(document) functions when the document loading is completed or (interactive, complete).

if (
    document.readyState === 'interactive' ||
    document.readyState === 'complete'
) {
    init();
} else {
    //...
    on('load',() => { init(); },),
}
const init = () => {
    takeFullSnapshot(); // Generate full snapshot
    handlers.push(observe(document)); //monitor
};

document.readyState contains three states:

1. interactive;

2. loading;

3. complete

From the literal meaning, takeFullSnapshot is used to generate a "complete" snapshot, that is, it will sequence document s into a complete data, which is called full snapshot.

All serialization related operations are completed using snapshot, which accepts a dom object and a configuration object, passes document, and serializes the whole page to get the completed snapshot data.

// Generate full snapshot
takeFullSnapshot = (isCheckout = false) => {
    //...
    const [node, idNodeMap] = snapshot(document, {
        //... Some configuration items
    });
    //...
}

idNodeMap is a key value key value pair object with id key and DOM object value

observe(document) is the initialization of some listeners. It also passes the whole document object to listen, and initializes some listeners by calling initObservers.

const observe = (doc: Document) => {
    return initObservers(//...)
}

On observer The initObservers function definition can be found in the TS file. This function initializes 11 listeners, which can be divided into three categories: DOM type / Event type / Media media:

export function initObservers(
    // dom
    const mutationObserver = initMutationObserver();
    const mousemoveHandler = initMoveObserver();
    const mouseInteractionHandler = initMouseInteractionObserver();
    const scrollHandler = initScrollObserver();
    const viewportResizeHandler = initViewportResizeObserver();
    // ...
)

DOM change listener mainly includes DOM change (addition, deletion and modification) and style change. The core is realized through MutationObserver

let mutationObserverCtor = window.MutationObserver;

const observer = new mutationObserverCtor(
    // Handling changing data
    mutationBuffer.processMutations.bind(mutationBuffer),
);
observer.observe(doc, {});
return observer;

Interactive monitoring - Take mouse movement initMoveObserver as an example

// Mouse movement record
function initMoveObserver() {
    const updatePosition = throttle<MouseEvent | TouchEvent>(
        (evt) => {
            positions.push({
                x: clientX,
                y: clientY,
            });
    });
    const handlers = [
        on('mousemove', updatePosition, doc),
        on('touchmove', updatePosition, doc),
    ];
}

The media type listener includes canvas / video / audio. Taking video as an example, it essentially records the play and pause status, and mediaInteractionCb calls back the play / pause status.

function initMediaInteractionObserver(): listenerHandler {
    mediaInteractionCb({
        type: type === 'play' ? MediaInteractions.Play : MediaInteractions.Pause,
        id: mirror.getId(target as INode),
    });
}

5. Snapshot

snapshot is responsible for serialization and reorganization. It mainly handles DOM serialization through serializeNodeWithId and DOM reorganization through rebuildWithSN function.

The serializeNodeWithId function is responsible for serialization and mainly does three things:

  • Call serializeNode to serialize Node;
  • Generate a unique ID through genId() and bind it to the Node;
  • The recursive implementation serializes the child nodes and finally returns an object with ID
// Serialize a DOM with ID
export function serializeNodeWithId(n) {
  // 1. serializeNode, the serialization core function
  const _serializedNode = serializeNode(n);
  // 2. Generate unique ID
  let id = genId();
  // Binding ID
  const serializedNode = Object.assign(_serializedNode, { id });
  
  // 3. Child node serialization recursion
  for (const childN of Array.from(n.childNodes)) {
    const serializedChildNode = serializeNodeWithId(childN, bypassOptions);
    if (serializedChildNode) {
      serializedNode.childNodes.push(serializedChildNode);
    }
  }
}

The core of serializeNodeWithId is to serialize DOM through serializeNode, and do some special processing for different nodes.

Processing of node attributes:
for (const { name, value } of Array.from((n as HTMLElement).attributes)) {
    attributes[name] = transformAttribute(doc, tagName, name, value);
}

Handle the external css style, get the specific style code through getCssRulesString, and store it in attributes.

const cssText = getCssRulesString(stylesheet as CSSStyleSheet);
if (cssText) {
    attributes._cssText = absoluteToStylesheet(
        cssText,
        stylesheet!.href!,
    );
}

To process the form, the logic is to save the selected state and do some security processing, such as replacing the content of the password box with *.

if (
    attributes.type !== 'radio' &&
    attributes.type !== 'checkbox' &&
    // ...
) {
    attributes.value = maskInputOptions[tagName] 
        ? '*'.repeat(value.length) 
        : value;
  } else if (n.checked) {
    attributes.checked = n.checked;
  }

Save canvas status save canvas data through toDataURL

attributes.rr_dataURL = (n as HTMLCanvasElement).toDataURL();

rebuild is responsible for rebuilding the DOM:

Reorganize nodes through buildNodeWithSN function
Recursive call to reorganize child nodes

export function buildNodeWithSN(n) {
  // DOM reorganization core function buildNode
  let node = buildNode(n, { doc, hackCss });
  // Child node reconstruction and appendChild
  for (const childN of n.childNodes) {
    const childNode = buildNodeWithSN(childN);
    if (afterAppend) {
      afterAppend(childNode);
    }
  }
}

6. Replay playback

The playback part is in replay TS file, first create the sandbox environment, and then reconstruct the full snapshot of the document. Play the incremental snapshot by simulating the timer through the requestAnimationFrame.

The constructor of replay receives two parameters, snapshot data events and configuration item config

export class Replayer {
    constructor(events, config) {
        // 1. Create sandbox environment
        this.setupDom();
        // 2. Timer
        const timer = new Timer();
        // 3. Playback service
        this.service = new createPlayerService(events, timer);
        this.service.start();
    }
}

The core three steps in the constructor are to create a sandbox environment, a timer, and initialize the player and start it. Player creation depends on events and timer. In essence, it still uses timer to play.

6.1 sandbox environment

First, in replay This can be found in the constructor of TS The core of setupDom is to create a sandbox environment through iframe.

private setupDom() {
  // Create iframe
  this.iframe = document.createElement('iframe');
  this.iframe.style.display = 'none';
  this.iframe.setAttribute('sandbox', attributes.join(' '));
}
6.2 playback service

Also in replay In the TS constructor, the createPlayerService function is called to create the player server, which is machine. in the same directory. TS defines that the core idea is to add snapshot actions to the timer and call the timer Start() starts playback of the snapshot.

export function createPlayerService() {
    //...
    play(ctx) {
        // Get the doAction function executed by each event
        for (const event of needEvents) {
            //..
            const castFn = getCastFn(event);
            actions.push({
                doAction: () => {
                    castFn();
                }
            })
            //..
         }
         // Add to timer queue
         timer.addActions(actions);
         // Start timer to play video
         timer.start();
    },
    //...
}

The playback service uses the third-party library @ xstate/fsm state machine to control various states (playback, pause, live broadcast)

Timer TS is also in the same level directory. The core is to realize the timer function through requestAnimationFrame, play back the snapshot, store the snapshot actions to be played in the form of queue, and then call action recursively in start DoAction to realize the snapshot restore of the corresponding time node.

export class Timer {
    // Add queue
    public addActions(actions: actionWithDelay[]) {
        this.actions = this.actions.concat(actions);
    }
    // Play queue
    public start() {
        function check() {
            // ...
            // Loop call doAction in actions, that is, the castFn function
            while (actions.length) {
                const action = actions[0];
                actions.shift();
                // doAction will play back the snapshot and perform different actions for different snapshots
                action.doAction();
            }
            if (actions.length > 0 || self.liveMode) {
                self.raf = requestAnimationFrame(check);
            }
        }
        this.raf = requestAnimationFrame(check);
    }
}

doAction will perform different actions in different types of snapshots. In the playback service, doAction will eventually call the getCastFn function to do some case s:

private getCastFn(event: eventWithTime, isSync = false) {
    switch (event.type) {
        case EventType.DomContentLoaded: //dom load parsing completed
        case EventType.FullSnapshot: // Full snapshot
        case EventType.IncrementalSnapshot: //increment
            castFn = () => {
                this.applyIncremental(event, isSync);
            }
    }
}

The applyIncremental function performs different processing for different incremental snapshots, including DOM increment, mouse interaction, page scrolling, etc. taking the case of DOM incremental snapshot as an example, it will eventually go to applyMutation:

private applyIncremental(){
  switch (d.source) {
      case IncrementalSource.Mutation: {
        this.applyMutation(d, isSync); // DOM change
        break;
      }
      case IncrementalSource.MouseMove: //Mouse movement
      case IncrementalSource.MouseInteraction: //Mouse click event
      //...
}

applyMutation is the place where the DOM restore operation is finally performed, including the addition, deletion and modification steps of DOM:

private applyMutation(d: mutationData, useVirtualParent: boolean) {
    d.removes.forEach((mutation) => {
        //.. Remove dom
    });
    const appendNode = (mutation: addedNodeMutation) => {
        // Add dom to specific node
    };
    d.adds.forEach((mutation) => {
        // add to
        appendNode(mutation);
    });
    d.texts.forEach((mutation) => {
        //... text processing 
    });
    d.attributes.forEach((mutation) => {
        //... Attribute processing
    });
}

The above is the key process implementation code of playback. rrweb not only does this, but also includes data compression, mobile terminal processing, privacy issues and other details. If you are interested, you can check the source code yourself.

last

This idea of recording and playback is really worth learning. The process of reading rrweb source code also benefits a lot. Some uses of data structures in the source code, such as double linked list, queue, tree, etc., are also worth a list.

Reference articles
  • rrweb-io/rrweb
  • rrweb: open the black box of web page recording and playback