Trap of variable length array in C language

Posted by dhie on Fri, 17 Dec 2021 20:05:41 +0100

Trap of variable length array in C language

This article is a translation, Original link

Compared with fixed length arrays, variable length arrays will generate additional code, making the code run slower and less robust~ Linus Torvalds

Variable length array, abbreviated as VLA (variable length array), is an array whose length is determined only at run time (an array with continuous address space, which is not represented as a data structure composed of multiple segments of memory of an array), rather than at compile time.

Provided in one or more ways VLAs Supported languages include: Ada, Algol 68, APL, C, C#, COBOL, 
Fortran, J, Object Pascal. As you can see, except C and C#, others are not the mainstream language.

VLA appears in C99 version. At first, they seem convenient and efficient, but it's all an illusion. In fact, they are often the root cause of recurring problems. Without this stain, C99 should have been a good version.

As you can see in the quote at the beginning of the article, Linux The kernel was once a widely used VLA Project.
Developers have made great efforts to get rid of it VLA,Finally in April 2018.20 Get what you want in the version and remove all the VLA. 

Allocate space on the stack

VLAs usually allocate memory space on the stack, which is the root of most problems. Let's take a simple example:

#include <stdio.h>

int main(void) {
    int n;
    scanf("%d", &n);
    long double arr[n];
    printf("%Lf", arr[0]);
    return 0;
}

Here you get a user input as the length of the array. Try to run him up and see how large the number will cause the program to report an error due to the segmentation fault caused by stack overflow. In my place, it can be up to 500000. This is only for the original type of data. If it is a structure array, the upper limit will be smaller. Or this array is not in main(), but in recursive calls, and the upper limit will decrease sharply.

However, for stack overflow, you have no good way to remedy it, because the program has crashed. Therefore, you must strictly check the array size before declaring the array, or you can expect users not to enter too large numbers (the outcome of this gamble is obvious).

The programmer must ensure that the size of the variable length array does not exceed a safe maximum, but in fact, if someone can know the safe maximum, he has no reason not to use it.

To make matters worse

In fact, when VLA is not handled properly, segmentation fault is already the best result. In the worst case, this is an exploitable vulnerability. Attackers can select an appropriate array size and use the array to cover other address spaces so that they can control these address spaces. This is a security nightmare.

At the expense of performance, you can GCC Used in -fstack-clash-protection Parameters.
This parameter is used to add additional instructions before and after variable length stack space allocation to detect each page of memory during allocation.
This instruction ensures that all stack memory allocations are valid. If there are invalid,
Just throw it out segementation fault Exception, which turns a possible code attack into a denial of service.

Improve the previous example

If you really need the user to input the array size, but don't want to waste space to apply for a large array in advance, what should you do? Using malloc()!:

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    int n;
    scanf("%d", &n);
    long double* arr = malloc(n * sizeof (*arr));
    printf("%Lf", arr[0]);
    free(arr);
    return 0;
}

In this example, I can input up to 1.3 billion, which is almost 2500 times more than before without segement fault. However, there will still be one cause

Upper limit of segement fault. The difference is that you can check the return value of the malloc() function to see if the address space allocation is successful.

 	long double* arr = malloc(n * sizeof (*arr));
    if (arr == NULL) {
        perror("malloc()"); // Output: "malloc(): Cannot allocate memory"
    }
There is such an opposite view, C Language is often used to write systems or embedded systems, in which case it may not be used malloc(). 
I have to repeat my opinion here because it's really important.
On these devices, you don't have much stack space.
Therefore, you should determine how much space you need, and then use a fixed length array instead of dynamically allocating space on the stack.
When using dynamic arrays on a system with a small stack, it is easy to appear, although everything seems normal,
However, due to deep function calls and a large amount of data allocation, the stack crashes.
If you always allocate a fixed size of stack space, these problems will not occur during testing.
Don't do anything that isn't good for you.

Produce unexpectedly

Unlike other dangerous C language functions, VLA is well known. Many novices learn to use VLA through trial and error, but they do not understand the pitfalls. Sometimes even experienced programmers inadvertently use VLA. The following code will quietly generate an unnecessary VLA:

const int n = 10;
int A[n];

Fortunately, the compiler will detect and optimize such Vlas, but what if it doesn't? Or is there no optimization based on other considerations (such as security)? Probably there will be no worse situation?

Slower than fixed length

Without compiler optimization, before passing in the array, Code using VLA The number of assembly instructions is Code for using fixed length arrays Seven times. In fact, after optimization, the situation is the same. see the following example:

#include <stdio.h>
void bar(int*, int);

#if 1 // 1 for VLA, 0 for VLA-free

void foo(int n) {
    int A[n];
    for (int i = n; i--;) {
        scanf("%d", &A[i]);
    }
    bar(A, n);
}

#else

void foo(int n) {
    int A[1000];  // Let's make it bigger than 10! (or there won't be what to examine)
    for (int i = n; i--;) {
        scanf("%d", &A[i]);
    }
    bar(A, n);
}

#endif

int main(void) {
    foo(10);
    return 0;
}

void bar(int* B, int n) {
    for (int i = n; i--;) {
        printf("%d %d", i, B[i]);
    }
}

In order to better illustrate the situation, - 01 level optimization is more appropriate (the assembly will be clearer, and - 02 level optimization is not obvious for VLA optimization)

After compiling the VLA version, before the corresponding instruction of the for loop, we can see:

push    rbp
mov     rbp, rsp
push    r14
push    r13
push    r12
push    rbx
mov     r13d, edi
movsx   r12, edi       ; "VLA"Start here
sal     r12, 2         ;
lea     rax, [r12+15]  ;
and     rax, -16       ;
sub     rsp, rax       ;
mov     r14, rsp       ; End here

The non VLA version is as follows:

push    r12
push    rbp
push    rbx
sub     rsp, 4000      ; Here is the definition of array
mov     r12d, edi

It can be seen that the code of fixed length array is shorter. Why does using VLA cause so much function header overhead? We may not have to think about everything, but it's not just pointer collision.

These differences must be of concern.

Initialization not allowed

In order to reduce the trouble of inadvertently using VLA, the following operations are not allowed:

int n = 10;
int A[n] = { 0 };

Even with compiler optimization, initializing VLAs is not allowed. Therefore, although we hope that the compiler can technically provide a fixed length array, this operation is not allowed.

Compiler author trouble

A few months ago, I saved a comment on Reddit about how compiler authors view the problems caused by VLA. Reference here:

  • A VLA applies to a type, not an actual array. So you can create a typedef of a VLA type, which "freezes" the value of the expression used, even if elements of that expression change at the time the VLA type is applied
  • VLAs can occur inside blocks, and inside loops. This means allocating and deallocating variable-sized data on the stack, and either screwing up all the offsets, or needing to do things indirectly via pointers.
  • You can use goto into and out of blocks with active VLAs, with some things restricted and some not, but the compiler needs to keep track of the mess.
  • VLAs can be used with multi-dimensional arrays.
  • VLAs can be used as pointer targets (so no allocation is done, but it still needs to keep track of the variable size).
  • Some compilers allow VLAs inside structure definitions (I really have no idea how that works, or at what point the VLA size is frozen, so that all instances have the same VLA(s) sizes.)
  • A function can have dozens of VLAs active at any one time, with some being created or destroyed at different times, or conditionally, or in loops.
  • sizeof needs to be specially implemented for VLAs, and all the necessary info (for actual VLAs, VLA-types, and hybrid VLA/fixed-size types and arrays and pointed-to VLAs).
  • 'VLA' is also the term used for multi-dimensional array parameters, where the dimensions are passed by other parameters.
  • On Windows, with some compilers (GCC at least), declaring local arrays which make the stack frame size over 4 KiB, mean calling a special allocator (__chkstk()), as the stack can only grow a page at a time. When a VLA is declared, since the compiler doesn't know the size, it needs to call __chkstk for every such function, even if the size turns out to be small.

When you browse other C language forums, you must have seen more different complaints.

Reduce support

Due to these problems mentioned above, some compiler providers decided not to fully support C99, starting with Microsoft's MSVC. The C language standards association also noted this problem, and in the C11 version, VLAs are optional (most choose to discard).

This means that the code using VLA may not be compiled with the C11 compiler, so you need to check whether the compiler supports it_ SRDC_NO_VLA_ Macro, and write a version that is not applicable to VLA as an alternative. Since you need to write a version that does not use VLA, why write a version that uses VLA?

It's worth mentioning, C++No, VLA,There is no sign of support in the future. C++He is not a destroyer, but he still opposes it C In language VLA . 

(a reason for being picky) broke the Convention

It may seem harsh, but it is also a reason to dislike VLA. The following is the widely used parameter transfer method of passing in two-dimensional arrays. We are used to passing in arrays first:

void foo(int** arr, int n, int m) { /* arr[i][j] = ... */ }

In C99, when there is an array in the function parameter list, the array size will be resolved immediately. This means that if VLA is used, the following parameter transfer method cannot be used

void foo(int arr[n][m], int n, int m) { /* arr[i][j] = ... */ } // INVALID!

You can only choose the following methods:

  • break with convention:

    void foo(int n, int m, int arr[n][m]) { /* arr[i][j] = ... */ }
    
  • Use outdated syntax

    void foo(int[*][*], int, int);
    void foo(arr, n, n)
        int n;
        int m;
        int arr[n][m]
    {
        // arr[i][j] = ...
    }
    

It's useful in some cases

There is a scenario that requires VLA: dynamically allocate multi-dimensional arrays, and the inner dimension of the array will not be known until runtime. There is even no security problem here because there is no arbitrary allocation of stack space.

int (* A)[m] = malloc(n * sizeof (*A)); // m and n are array dimensions
if (A) {
    // A[i][j] = ...;
    free(A);
}

Without VLA, there can be the following alternatives:

  • Apply line by line using malloc():

    int** A = malloc(n * sizeof (*A));
    if (A) {
        for (int i = 0; i < m; ++i) {
            A[i] = malloc(m * sizeof (*A[i]));
        }
        // A[i][j] = ...
        for (int i = 0; i < m; ++i) {
            free(A[i]);
        }
        free(A);
    }
    
  • One dimensional array plus offset:

    int* A = malloc(n * m * sizeof (*A));
    if (A) {
        // A[i*n + j] = ...
        free(A);
    }
    
  • Use a large fixed length array:

    int A[SAFE_SIZE][SAFE_SIZE]; // SAFE_SIZE must be safe for SAFE_SIZE*SAFE_SIZE
    // A[i][j] = ...;
    

summary

In short, avoid using Vlas. It brings danger but no benefit. If you really want to use it, please remember its limitations.

It is worth mentioning that, VLA There are more problems`alloca()`Solution (not standard)). 

Topics: C