By Ethan Lu
Translator | Sambodhi
Planning | Xin Xiaoliang
Gifhy provides a lot of GIF media content. In fact, there are more than 10 billion pieces of content every day. In addition to the media requests actually downloaded on behalf of GIF, we also provide public API and SDK services so that developers can use them in their products, so that their users can access our huge library.
Like many technology companies that have a lot of traffic every day, we face the challenge of scalability. The system must be able to handle a large number of requests (within 10000 requests per second) with little response latency. The worst thing is waiting to load, especially GIF!
This is where the edge cloud platform plays its role: instead of allowing our AWS server to process each request, the edge cloud platform caches the JSON load of media content and search results as much as possible. This is very effective because the media content and API response do not change frequently. The edge cloud platform server also distributes the request load to different regions. We use Fastly driven edge cloud platform to provide users with billions of content.
1 Fastly solution
Fastly provides a variety of capabilities that enable us to deliver content on a large scale. These characteristics can be roughly classified as:
- Cache hierarchy
- Cache management
- Edge calculation
Cache hierarchy
The basic edge cloud platform is set to cache content at the edge. These server nodes are distributed around the world and provide cached content to users who send requests in their region. If the edge node does not have any content, it will send a request to the origin server to retrieve the content.

Such a single-layer setting is defective. Each edge node maintains its own cache according to the request of its region. Therefore, a new content fragment may not be cached on any edge node, which may lead to a surge in traffic to our original server when each edge node repeats the same content request. Due to the increasing popularity of viral content, this behavior often occurs.
Fastly provides a layer 2 cache service called Origin Shield. Now, we can retrieve the edge nodes without the requested content in the cache from the Origin Shield layer. The request only needs to reach our original server.

Cache management
Since the content is cached at the edge and Origin Shield, we need to manage its caching policy. Not all content should have the same cache time, or TTL (Time to Live). For example, the information of a single GIF will not change much, so its API response can be cached for a long time. The API response of the Trending Endpoint returns the continuous update list of the current trend GIF. Due to the nature of the trend, it needs to be on a short TTL.
Fastly is driven by Varnish, so all configurations are executed in the form of Varnish configuration language (VCL) code. Both edge and Origin Shield run VCL codes, so we can set various cache TTL S based on API endpoint paths through some simple VCL Codes:
# in vcl_fetch if (req.url ~ "^/v1/gifs/trending") { # set 5 minute ttl for trending responses set beresp.ttl = 600s; return(deliver); }
VCL code is not always used to set the cache TTL. The API request sent to Origin can encode the cache control instruction in the response of Origin. Just set the VCL code to be rewritten. In Origin, we can pass this decision to Fastly's Origin Shield and edge nodes by setting the cache control header in the API response. In particular, the surrogate control header will be used only for Fastly nodes. Therefore, we can update the above VCL to make surrogate control take precedence over the endpoint cache policy, as shown below:
# in vcl_fetch if (beresp.http.Surrogate-Control ~ "max-age" || beresp.http.Cache-Control ~ "(s-maxage|max-age)" ) { # upstream set some cache control headers, so Fastly will use its cache TTL return(deliver); } else { # no cache headers, so use cache policies for endpoints if (req.url ~ "^/v1/gifs/trending") { # set 10 minute ttl for trending responses set beresp.ttl = 600s; return(deliver); } }
Through this setting, we can make the cache content automatically invalidate through the dynamic TTL policy to meet our needs. However, if we don't want to wait for the cache to expire naturally, we also need to explicitly invalidate the cache. You can invalidate the cache by simply using the cache key (URL). This is effective for the media, but the response to the API is a little complex.
For example, our API search endpoint can return the same GIF for different queries, but if we want to invalidate it, we cannot know each URL that may generate a GIF:
# same GIF can appear in the response of all of these API calls https://api.giphy.com/v1/gifs/search?api_key=__KEY1__&q=haha https://api.giphy.com/v1/gifs/search?api_key=__KEY1__&q=hehe https://api.giphy.com/v1/gifs/search?api_key=__KEY2__&q=lol https://api.giphy.com/v1/gifs/search?api_key=__KEY3__&q=laugh
In this case, we use Fastly's Surrogate Key! As the name suggests, the proxy key can uniquely identify the contents of the cache in the same way as the cache key. Different from cache keys, each stored result can have multiple proxy keys, and we can set proxy keys. The GIF ID displayed in each API response allows us to identify multiple cached contents containing a specific GIF:
# same GIF (gif_id_abc) can appear in the response of all of these API calls https://api.giphy.com/v1/gifs/search?api_key=__KEY1__&q=haha Assign Surrogate Key: gif_id_abc https://api.giphy.com/v1/gifs/search?api_key=__KEY1__&q=hehe Assign Surrogate Key: gif_id_abc https://api.giphy.com/v1/gifs/search?api_key=__KEY2__&q=lol Assign Surrogate Key: gif_id_abc https://api.giphy.com/v1/gifs/search?api_key=__KEY3__&q=laugh Assign Surrogate Key: gif_id_abc
You can also add multiple proxy keys to the same content:
# same GIF (gif_id_abc) can appear in the response of all of these API calls https://api.giphy.com/v1/gifs/search?api_key=__KEY1__&q=haha Assign Surrogate Key: gif_id_abc https://api.giphy.com/v1/gifs/search?api_key=__KEY1__&q=hehe Assign Surrogate Key: gif_id_abc https://api.giphy.com/v1/gifs/search?api_key=__KEY2__&q=lol Assign Surrogate Key: gif_id_abc https://api.giphy.com/v1/gifs/search?api_key=__KEY3__&q=laugh Assign Surrogate Key: gif_id_abc
Proxy keys are a powerful feature that allows us to choose the right cache and invalidate it very accurately and simply. With this setting, we can invalidate the cache in the following cases:
- Invalidate all cached API responses containing a specific GIF;
- Invalidate all cached API responses for a specific API key;
- Invalidates all cached API responses that query certain words.
Run code at edge
VCL provides us with a lot of functions in the configuration of edge cloud platform. We previously showed how to configure various caching TTL policies for edge and Origin Shield nodes, but we can also use VCL to set request information.
We can rewrite the incoming request URL with code. If we need to modify our API endpoint, it will be more convenient to do so without bothering our consumers to update their calls.
# in vcl_recv if (req.url ~ "^/some-old-endpoint") { # rewrite to the new endpoint set req.url = regsub(req.url, "/some-old-endpoint", "/new-and-improved-endpoint"); }
You can also select a certain proportion of incoming requests to test the experimental characteristics. Using Fastly's randomness library, we can add a special header to some requests to realize the new behavior on the original server.
# in vcl_recv set req.http.new_feature = 0 if (randombool(1,10000)) { # .01% of the traffic gets to see the new feature set req.http.new_feature = 1; }
It combines Fastly's edge dictionary so that we can establish different behaviors with minimal code.
# API keys that will have a percentage of their request use the new feature table new_feature_access { "__API_KEY1__": "1", "__API_KEY2__": "5", "__API_KEY3__": "1000", } sub vcl_recv { set req.http.new_feature = 0 # check if request has an api key that is setup to have a percentage of its requests use the new feature if (randombool(std.atoi(table.lookup(new_feature_access, subfield(req.url.qs, "api_key", "&"), "0")) ,10000)) { set req.http.new_feature = 1; } return(lookup); }
This just touches the surface of the functions implemented by VCL. If you want to know what else to do, you can find Fastly's documentation here!
https://developer.fastly.com/
2 Tips
We use many features of Fastly to provide GIF animation content for the world. However, when you can use so many features, configuring the edge cloud platform may become very complex. Therefore, here are some tips we recommend to help you complete this task.
Perform VCL in edge and Origin Shield
For a two-tier cache setup, one key thing to remember is that the same VCL code will be executed at the edge and Origin Shield. This may cause the VCL code to have unexpected results when the status information of the request / response changes.
For example, our previous VCL code will set the cache TTL for Origin Shield and edge nodes according to the cache TTL specified by the upstream cache control header or the VCL code itself:
# in vcl_fetch if (beresp.http.Surrogate-Control ~ "max-age" || beresp.http.Cache-Control ~ "(s-maxage|max-age)" ) { # upstream set some cache control headers, so Fastly will use its cache TTL return(deliver); } else { # no cache headers, so use cache policies for endpoints if (req.url ~ "^/v1/gifs/trending") { # set 10 minute ttl for trending responses set beresp.ttl = 600s; return(deliver); } }
Suppose that for the Trending Endpoint, we also set the cache control header of the response, so that we can instruct the caller to cache the content to another period of time. To do this, simply follow these steps:
# in vcl_fetch if (beresp.http.Surrogate-Control ~ "max-age" || beresp.http.Cache-Control ~ "(s-maxage|max-age)" ) { # upstream set some cache control headers, so Fastly will use its cache TTL return(deliver); } else { # no cache headers, so use cache policies for endpoints if (req.url ~ "^/v1/gifs/trending") { # set 10 minute ttl for trending responses set beresp.ttl = 600s; # set 30 second ttl for callers set beresp.http.cache-control = "max-age=30"; return(deliver); } }
Origin Shield will execute this VCL code, add the cache control header to the response header, and return it to the edge. However, at the edge, it will see that cache control is set in the response and execute the if statement. This will cause the edge node to use 30 seconds of cached TTL instead of the expected 10 minutes!
Fortunately, Fastly provides a way to distinguish between edge and Origin Shield. It sets the header (Fastly FF) in the request:
# in vcl_fetch if (req.url ~ "^/v1/gifs/trending") { # set 10 minute ttl for trending responses set beresp.ttl = 600s; return(deliver); } # in vcl_deliver if (!req.http.Fastly-FF) { # set 30 second ttl for callers set resp.http.cache-control = "max-age=30"; }
Through this addition, the cache control header will be set only on the edge node, and our cache policy will run as expected again!
Commissioning and testing
The traps we just mentioned may be difficult to find and debug. VCL code just runs on the server and displays you the response and response header information. Just add debugging information to the custom header and view them in the response, but this quickly becomes inconvenient.
Fortunately, the Fastly Fiddle tool can get better information when executing VCL code. In this tool, we can simulate various VCL code parts and understand the edge of Fastly and how the Origin Shield server will process VCL code.
The following is the fiddle of the above example, which shows that double execution of VCL will affect the cache TTL.
We set the VCL in the appropriate part on the left, and then execute it to see how Fastly will handle the request on the right:

The figure above shows a lot of useful information about its life cycle when a request passes through the edge and Origin Shield nodes. In a real environment, VCL code can be very complex, and this tool is excellent in this case.