Front End Performance Optimization - Offline Cache for Static File Clients _20191110

Posted by creekriot on Sun, 10 Nov 2019 08:06:14 +0100

Front End Performance Optimization - Offline Caching for Static File Clients

1. Preface

Last article shared how to optimize your project to the extreme during the webpack packaging phase.Article Link: Optimize webpack packaging to the extreme _20180619

Front-end optimization is endless. This article focuses on my exploration and experience summary of project optimization in front-end engineering.

2. Explore business bottlenecks

Performance bottleneck of H5 pages, network factors can account for almost 80%.Whether you reduce the volume of output files, use HTTP2.0 or PWA, and so on, it reduces the impact of the network on H5 page loading.

Our products are mainly used in Latin American countries.Latin American users have a distinct feature:

  1. The network environment is poor (2G, 3G users still exist);
  2. User mobile models are mainly Android and are older (representing older webview s);

The following figure is the information queried from the company's h5 performance monitoring platform.

From the diagram above, you can see that most users are still in a good network environment.But front-line colleagues often feedback that our H5 pages open slowly on users'phones.We had two guesses after discussion:

  1. Does our page domain name (html domain name) and static domain name (static file) take longer to resolve on the local CDN if the file size has been optimized?
  2. Although Nginx of cdn sets up HTTP caching (fields such as max-age) for static files, it does not cache as long as expected on the user's mobile phone for some reason.

3. Exploration of Network Factors

With speculation and because of this, we have explored the above issues through technical means.The theoretical knowledge to explore dependency is as follows:

The image above shows the entire life cycle of a page loaded from a browser address bar when we press enter.A browser (or webview) provides a Performance API that allows us to get the start time of each phase.

View Documentation Script to report data based on the Performance API. Click here for script links

I'll take out the main part of the code and the things you need to be aware of when you use it:


// Note that some older Android models are found in practice without the getEntries method, so the script should be compatible
const entryList = performance && performance.getEntries && performance.getEntries();

// Normally we put CSS in the head tag and js at the bottom of the body tag.If the static file uses a separate CDN domain name, this is all you need to do to get the first loaded CSS object
    for (let i = 0; i < entryList.length; i++) {
        const obj = entryList[i];
        if (obj.initiatorType === 'link') {
            linkPerformance = obj;
            break;
        }
    }
    
    // Here is the start time for each phase of acquisition
    const cssDomainlookStart = linkPerformance ? linkPerformance.domainLookupStart : 0;
    const cssDomainlookEnd = linkPerformance ? linkPerformance.domainLookupEnd : 0;

    const connectStart = linkPerformance ? linkPerformance.connectStart : 0;
    const connectEnd = linkPerformance ? linkPerformance.connectEnd : 0;

    const requestStart = linkPerformance ? linkPerformance.requestStart : 0;
    const responseStart = linkPerformance ? linkPerformance.responseStart : 0;
    const responseEnd = linkPerformance ? linkPerformance.responseEnd : 0;


// Omega is the company's unified script for reporting data.Here is the time difference between the beginning and end of each phase
    Omega && Omega.trackEvent && Omega.trackEvent('static_domain_timing', '', {
        country: country,
        lookupTiming: cssDomainlookEnd - cssDomainlookStart,
        connectTiming: connectEnd - connectStart,
        requestTiming: responseStart - requestStart,
        responseTiming: responseEnd - responseStart,
        host: 'static.didiglobal.com',
        cityid
    });

3.1 Data research results on html domain names

Because the data is written to the database through script upload and then discovered through SQL script.So they are all presented in tabular form.

3.2 Data research results on CDN domain names

  • _c0 DNS Resolution Time
  • _c1 TCP Handshake Time
  • Time when _c2 initiated request to reques end
  • _c3 response start to response end time
  • _c4 City ID
  • _c5 Country

summary

When we have these statistics, we come to the conclusion that:

Overall, in major Latin American cities, the network environment in which our page and static domain names operate is good.There are slow cases, but they are negligible compared to the total number.

4. Problem guessing and optimization scheme

The above conclusion does not mean that there is no room for optimizing the network environment of our business at present, but that the current status can meet the needs in terms of optimizing costs and tightness.However, we have subsequently pushed the Operations and Maintenance Department to upgrade our network protocol to HTTP2.0.

Conjecture of 4.1 Question

Excluding the influence of network factors on the loading speed of our pages, then what exactly affects the loading speed of pages?Because 100% of our projects use vue single-page apps, one of the biggest drawbacks of single-page apps is that the first screen is slow.

  1. If the user is entering a page for the first time, the slowness is understandable because new downloads of JS, CSS, HTML are required (which is often referred to as "first screen slow");
  2. N th entry, static resources such as JS will fail slowly if the client caches CSS;
  3. If a page is modified, it will be slower to go online.(This can be similar to 1, which is the first screen)

From the three cases we have inferred above, we propose a hypothesis:

If you cache the html and static files (JS, CSS) needed to render h5 in advance in some way, can you speed up the rendering of the first screen, how much can you improve it, and is there a need for large-scale promotion?

4.2 Data Results

To validate the conjecture in 4.1, we need data support.Next we asked our client classmates to do some experiments with us.

At the time of client publishing, the resources needed for rendering "one-page" H5 are packaged and distributed with the client in advance, and then the loading speed of these pages is monitored.

Here are some data from our statistical comparison of rendering time before and after pre-caching to client pages:

Based on the above data as a theoretical support, the conclusion is that caching static resources into the client ahead of time can significantly speed up the rendering of pages.

5. Automated offline caching scheme

In 4.2, a manual method was used to verify the effect of pre-cached static resources on the rendering of H5 pages on the client side.Each time we need to communicate with the client ahead of time, we need to manually package the resources we need to cache and send them to the client's classmates in advance.

There are many drawbacks to this approach:

  • Very inefficient, manually packaged each time, and then manually "implanted" the cached resources by the client;
  • Pages that need to be cached are also not flexible;
  • If an online page has an online problem and it is fixed before the client publishes it, the front-end online will cause the cache to fail.
  • The advantage of H5 is that it can go online at any time.The above methods depend on client distribution and cannot be updated dynamically.

With so many drawbacks, it is obviously not advisable to use this method to promote in our line of business.Here we have designed a scheme to automate caching of static resources into the client.

5.1 Client Side

To implement an automated caching scheme, the client and front end must work closely together.Because I am mainly responsible for the front-end, I will briefly introduce the content of the client.If there are any students who want to discuss in depth, you can leave a message for me.

The basic process can be referred to in the following figure:

  1. Request the server API to get the static file information that needs to be cached when the client starts or when the user opens the sidebar.
  2. The client adds a layer of interception when loading h5 page resources to determine if the resource requested by the current url is cached in the client, and if the cache uses the local cache directly, it does not walk away from the network.If the cache cannot be found locally, go directly to the network request line.

It's just a simple process.In practical development, mapping relationship between url and local cache files, cache deletion, and offline cache module are also required.

5.2 Front End Side

The process of getting a front-end project online is basically the following.

  1. Build code for local or developer machines (most companies have dedicated online servers to build code);
  2. Copy the built code directory to the online server, if there is a CDN server, and copy the static file to the CDN.

When designing a front-end offline caching scheme, one of the important considerations is that this scheme can not affect our existing business and build, and the online process has too much impact. In other words, our old projects can be integrated into this set of offline caching schemes with simple modifications.

Finally, our plan is as follows:

The above scenarios rely on the theory that CDN s are designed to share server pressure, shunt, provide file downloads, speed up, and so on.To reduce the number of static resource requests that clients need to cache at startup, resources that need to be cached are packaged in a zip file for clients to download during the build phase.

  1. Each line of business (git repository) builds its own separate offline zip
  2. Offline zip follows a static file online to the CDN.Take full advantage of CDN download resources (see that the nginx configuration branch of CDN does not support requests for zip files).
  3. Local builds store file information for later online builds to zip only changing resources (reduce zip file size).

6. Develop webpack plug-ins

As you can see from the previous section, we want to put the time to generate the zip cache package in the build phase.Because all of our projects are built with webpacks.So a webpack plugin has been developed to do this.

I referred to this plugin when developing it ak-webpack-plugin Most of the configuration and functionality of this plug-in.Since ak-webpack-plugin lacks some of the functionality that our business needs, I have redeveloped it to meet our business needs.The plugin used in our project is called webpack-static-chache-zip .This plugin makes webpack2.0+ compatible with webpack4.0.

The functional points of the plug-in are as follows:

Detailed configuration that you can refer to:

const AkWebpackPlugin = require('webpack-static-chache-zip');

    // Initialize Plug-in
    new AkWebpackPlugin({
    // Final generated offline package name, default is `offline`
    'offlineDir': 'offline',

    // Code source for the build environment, default `output` webpack compiles the directory that produces the production environment code
    'src': 'output',
    // Whether to keep the generated offline package folder (the source file of the zip package)
    'keepOffline': true,

    // datatype: [required] 1 (Android driver), 2 (Android driver), 101 (iOS driver), 102 (iOS driver)

    'datatype': '',
    // terminal_id business names, such as passenger wallets, cannot repeat a specific view of Wiki http://wiki.intra.xiaojukeji.com/pages/viewpage.action?PageId=118882082

    'terminal_id': '',
    // If there is a case where data_type corresponds to multiple terminal_id s.You can list them as array objects as follows
    'terminal_list': [
        {
            data_type: 1
            terminal_id: 2
        },
        {
            data_type: 1
            terminal_id: 2
        }
        ...
    ],

    // The file path you want to include, fuzzy matching, takes precedence over excludeFile
    'includeFile': [
      'balance_topup',
      'static',
      'pay_history'
    ],

    // File path fuzzy matching priority to exclude is lower than includeFile
    'excludeFile': [
        'repay_qa',
        'test',
        'pay_account',
        'fill_phonenum',
        'balance_qa',
        'select_operator',
        'payment_status'
    ],
    // Cached file type, default is html js css if you need to cache other types of files ['png','jpg']
    'cacheFileTypes': [],

    // To keep the product line ID unique contact module, go to http://wiki.intra.xiaojukeji.com/pages/viewpage.action?pageId=272106764 to see that the module you are using is already in use
    // Register your module if you have access to it
    'module': 'passenger-wallet',

    // Page domain name, which can be configured with multiple domain names, is mainly used in scenarios where a file may be online and used by multiple domain names
    // For example: https://a a a a.com/a.html and https://bbbbbb.com/a.html actually access the same file just because different business scenarios use different domain names
    'pageHost': 'https://page.didiglobal.com',

    // urlpath
    'urlPath': '/global/passenger-wallet/',

    // This field and the patchCdnPath below are special.For example, when we package the output path/xxx/xx/output/aaa/bb/index.html online, we actually copy the output directory to
    // In principle, the url of our pages on the server should be https://page.didiglobal.com/aaa/bb/index.html, but some items may configure our actual access through ngxin to shorten path finding
    // Where https://page.didiglobal.com/index.html is located you can configure patchUrlPath:'aaa/bb'
    'patchUrlPath': '',

    // cdn domain name static file domain name (js/css/html) pageHost is used by default if it is not configured or set to an empty array
    'cdnHost': 'https://static.didiglobal.com',

    // cdnpath will default to urlPath if not set
    'cdnPath': '',

    // Refer to the patchUrlPath usage above
    'patchCdnPath': '',

    // The domain name of the zip file will default to cdnHost if not set
    'zipHost': '',

    // zipPath will default to cdnPath if not set
    'zipPath': '',


    // An H5 page runs in different ends (for example, our Brazilian and global drivers are two separate clients), and the H5 pages in these two ends are different
    // The page domain name and static domain name can be configured to set the domain names of the environment pages and static files through the otherHost configuration at this time.
    // Can not be set or empty
    'otherHost': {
      // The domain name of the page
      'page': 'page.99taxis.mobi',
      // You can set a separate cdn domain name if you don't set it up to be the same as the page domain name
      'cdn': 'static.99taxis.mobi'
    },

    // Compression parameters, see https://archiverjs.com
    'zipConfig': {zlib: {level: 9} },
    // The following callback methods can be used directly with this.fs (fs-extra), this.success, this.info, this.warn, this.alert
    // Before copying files to offline folders
    beforeCopy: function () {},
    // After copying files to offline folder
    afterCopy: function () {

    },
    // Before compressing the offline offline folder
    beforeZip: function (offlineFiles) {
        // File path information for offlineFiles in offline package folder
    },
    // After compressing the offline offline folder
    afterZip: function (zipFilePath) {
        // Offline zip package path generated by zipFilePath
    }
})

6.1 Plug-in Workflow

Here is the workflow for the plug-in

  1. Get the time when webpack compilation ends
  2. From compiling the output directory copy file to the offline directory (the directory to be compressed into zip files), the process will use include and exclude to determine if the file file file needs to be copied
  3. If you already have compilation information online, compare the zip file that needs to be compressed this time with the last one

Version 6.2 diff

In this case, version diff means that after the first full online offline package, in order to save user traffic and disk space of the user's mobile phone, the second to N th online, we should all be online after the diff zip package, and need to tell the client some old file information, so that the client can delete (save user's disk space).

File information and diff information should be stored using a storage service (database).However, in order to speed up compilation, data exchange with server interface requests does not occur during compilation and diff.All information is stored locally as a json file.After the compilation is online, the information is sent to the server storage uniformly.

The basic flow of diff is as follows:

Our requirement is to cache only five versions of offline packages online.That is, each online version will be combined with the previous five versions.According to the combination formula of high school learning, we should cache up to 10 diff versions and a full version.That is 11 versions.

6.3 diff principle

Because clients need to check the integrity of offline files after downloading offline cache resources.So when the zip package is generated, the plugin calculates each MD5 that needs to be cached, and the client also calculates the MD5 after downloading, comparing it to my MD5.

Because I calculate the MD5 of a file once during the build phase, I use this MD5 value as a unique identifier for a file when diff.

The version's diff principle can be simplified to a comparison of the following two arrays.As follows:

Here's a core code sample. Maybe my implementation isn't the best. If the big guys have a better implementation, leave a message for me (thank you manually)

// There are two arrays, oldArr and newArr. When oldArr becomes newArr, those elements are new and which should be deleted
// We can add a field type to the element to mark 0. We need to delete 1 to identify that the existing 2 marker elements are new

const oldArr = [
    {tag: 1},
    {tag: 2},
    {tag: 3},
    {tag: 4}
];
const newArr = [
    { tag: 3 },
    { tag: 4 },
    { tag: 5 },
    { tag: 6 }
]

const newArrTag = [];
// Need to be deleted
const delArrList = [];

for (let i = 0; i < newArr.length; i++) {
    const newItem = newArr[i];

    // Set each item of newArr as new by default
    newItem.type = 2;

    newArrTag.push(newItem.tag);

    for (let m = 0; m < oldArr.length; m++) {
        const oldItem = oldArr[m];

        if(newItem.tag === oldItem.tag ) {
            newItem.type = 1
        }
    }
}

for (let n = 0; n < oldArr.length; n++) {
    const oldItem = oldArr[n];

    if(!newArrTag.includes(oldItem.tag)) {
        oldItem.type = 0
        delArrList.push(oldItem);
    }

}

const resultArr = newArr.concat(delArrList);


console.log(resultArr)

// The following is the output, after changing oldArr to newArr, each element is new, should be deleted, or marked with the old
[ { tag: 3, type: 1 },
  { tag: 4, type: 1 },
  { tag: 5, type: 2 },
  { tag: 6, type: 2 },
  { tag: 1, type: 0 },
  { tag: 2, type: 0 } ]

The above code is the process for version diff.Of course, it's more complex to combine with webpack and business.Interested students can check the previous plug-in link and read the source code.

Summary:

This article describes our team's previous research and actual development work on offline static file caching to optimize the loading speed of H5 pages on the client side, as well as some engineering practices to the end.The overall increase in the speed of loading our pages is still significant.It only needs to be accessed as a webpack plug-in, does not need to make too many modifications to existing front-end projects, and improves the efficiency of front-end engineers.

The space is limited, some of the content may be introduced or more general and vague, interested students or students who want to practice in their own team, if there are questions, you can leave a message for discussion.

If you think my writing is good, you can go to github and give it to me Plug-in unit and blog Warehouse point star.Your encouragement is my greatest motivation to write.

Topics: node.js Webpack network Android Mobile