Do you really know array_diff_uassoc as you know it?

Posted by IceRegent on Thu, 10 Oct 2019 08:34:22 +0200

If you are asked to describe the PHP function array_diff_uassoc in one sentence, you may come to the point where you compare two or more functions at the same time and return the same data as the key values that appear in the first function and not in other functions at the same time.

Recently, I saw an interesting question about reading array_diff_uassoc execution before I realized how deeply I misunderstood this function.

Here is a simplified version of the problem:


function comparekey($a,$b){
    return 0;
}

$array1 = ['a'=>1,'b'=>2,'c'=>3,'d'=>4];
$array2 = ['a'=>2,'d'=>4,'e'=>6];
$res = array_diff_uassoc($array1,$array2,'comparekey');
var_dump($res);

Why is the result?

['a'=>1,'c'=>3,'d'=>4];

According to normal logic, array_diff_uassoc returns array data with different keys and different values. When the custom comparison function returns 0, the key value is considered the same. So what normal logic should return is

['a'=>1,'b'=>2,'c'=>3]

Are you really right?

1. Does the custom function compare the keys of two arrays?

Actually, to be honest, I thought so at first. It wasn't until I output a and B in the custom function and saw the wonderful output that the comparison function wasn't that simple.

To make it easy to see the contents, replace the contents of the array in the problem with the following arrays

function comparekey($a,$b){
    echo $a.'-'.$b;
    return 0;
}

$array1 = ['a'=>1,'b'=>2,'c'=>3,'d'=>4];
$array2 = ['e'=>'2','f'=>5,'g'=>6];
$res = array_diff_uassoc($array1,$array2,'comparekey');

The output of the function is

a-b b-c c-d e-f f-g a-e b-e c-e d-e

So you can see that the keys that are passed in for comparison are not necessarily keys from different arrays. It may also be the key of the same array.

2. Does the custom function just compare whether the key values are equal?

Of course not. The comparison function itself is relatively large. But it's not what we understand to compare whether the key values are equal. According to the custom return result, the internal pointer position of php will b e adjusted, so we can see that the latter comparison is a-e b-e c-e d-e.

3. When comparing key values, is it really the key values of array elements with the same healthy name?

This is not true either. In fact, it is because the array result of the comparison function affects the change of the array pointer position in php. Different ways of changing lead to the final price comparison is not the value of the same key name we think is compared with each other.

Looking at the PHP source code, array_diff_uassoc is ultimately implemented through the php_array_diff function.

static void php_array_diff(void *base, size_t nmemb, size_t siz, compare_func_t cmp, swap_func_t swp)
{
    ...

if (hash->nNumOfElements > 1) {
    if (behavior == DIFF_NORMAL) {
        zend_sort((void *) lists[i], hash->nNumOfElements,
                sizeof(Bucket), diff_data_compare_func, (swap_func_t)zend_hash_bucket_swap);
    } else if (behavior & DIFF_ASSOC) { /* triggered also when DIFF_KEY */
        zend_sort((void *) lists[i], hash->nNumOfElements,
                sizeof(Bucket), diff_key_compare_func, (swap_func_t)zend_hash_bucket_swap);
    }
}
...
}

You can see that diff_key_compare_func is passed to the sort function. Therefore, the return result of the custom function will affect the output of the temporary variable lists.

In php, all input arrays are sorted first. So in the custom function, you can see that the previous output is to compare the key names of arrays in turn.

True face

When the keynames of all the input arrays are in order, the keynames of the first array are compared with those of the other arrays.

1) Compare the key name of the current element of the first array with the key name of each element of the array, until the first one is the same or the comparison ends.

RETVAL_ARR(zend_array_dup(Z_ARRVAL(args[0])));
while (Z_TYPE(ptrs[0]->val) != IS_UNDEF) {
  for (i = 1; i < arr_argc; i++) {
    Bucket *ptr = ptrs[i];
    if (behavior == DIFF_NORMAL) {
      while (Z_TYPE(ptrs[i]->val) != IS_UNDEF && (0 < (c = diff_data_compare_func(ptrs[0], ptrs[i])))) {
        ptrs[i]++;
      }
    } else if (behavior & DIFF_ASSOC) { /* triggered also when DIFF_KEY */
      while (Z_TYPE(ptr->val) != IS_UNDEF && (0 != (c = diff_key_compare_func(ptrs[0], ptr)))) {
        ptr++;
      }
    }
    ...
  }
  ...
}

2) If the key name is the same (the health name comparison function returns 0), then whether the key value is equal is compared. If not, set c to - 1 and continue comparing elements of the next array.

RETVAL_ARR(zend_array_dup(Z_ARRVAL(args[0])));
while (Z_TYPE(ptrs[0]->val) != IS_UNDEF) {
    ...
    for (i = 1; i < arr_argc; i++) {
        ...
        if (!c) {
            ...
            if (diff_data_compare_func(ptrs[0], ptr) != 0) {
                c = -1;
                if (key_compare_type == DIFF_COMP_KEY_USER) {
                    BG(user_compare_fci) = *fci_key;
                    BG(user_compare_fci_cache) = *fci_key_cache;
                }
            }
            ...
        }
        ...
    }
    ...
}

3) According to the comparison results, if the comparison results are not equal, then use the next element of the first array to compare all the elements of other arrays.

If the comparison results are equal (c=0), the key name corresponding to the return array (copied from the first array) is deleted.

RETVAL_ARR(zend_array_dup(Z_ARRVAL(args[0])));
while (Z_TYPE(ptrs[0]->val) != IS_UNDEF) {
    ...
    if (!c) {
        for (;;) {
            p = ptrs[0];
            p = ptrs[0];
            if (p->key == NULL) {
                zend_hash_index_del(Z_ARRVAL_P(return_value), p->h);
            } else {
                zend_hash_del(Z_ARRVAL_P(return_value), p->key);
            }
            if (Z_TYPE((++ptrs[0])->val) == IS_UNDEF) {
                goto out;
            }
            ...
        }
    }
    else {
        for (;;) {
            if (Z_TYPE((++ptrs[0])->val) == IS_UNDEF) {
                goto out;
            }
            ...
        }
        ...
    }
...
}

The following column arrays and custom functions are used as examples to illustrate the comparison process.

function comparekey($a,$b){
    return 0;
}

$array1 = ['a'=>1,'b'=>2,'c'=>3,'d'=>4];
$array2 = ['a'=>2,'d'=>4,'e'=>6];

Set the return array not array1

If the healthy name "a" and "a" are equal, then compare array1['a']!=$array2['a'].

If the healthy name "b" and "a" are equal, then compare array1['b']==$array2['a'], and delete the key value'b'that returns the array.

If the healthy names "c" and "a" are equal, then compare array1['c']!=$array2['a'].

If the healthy names "d" and "a" are equal, then compare array1['c']!=$array2['a'].

So the final return array is

$res = ['a'=>1,'c'=>3,'d'=>4]

summary

So, custom functions don't allow us to completely customize them. A custom function returns a result that results in a different output. php arrays have many methods that provide custom functions. However, if the return value of your custom function is "contrary to common sense", for example, the function in this problem is always equal, but the key value of the same array of php can not be the same, so the comparison result of this custom function is actually "problematic". On this premise, the results returned by php may also have unexpected output.

Next time you use the array_diff_uassoc function, you should know that this custom function not only compares the health names of two arrays, but also affects the internal sorting of the input arrays by php before comparison. The return result of the custom function will directly affect the change order of the php array pointer, resulting in different results of comparison.

The article was first published by the public.
PS: Sharing is not easy. If it's useful, remember to share.

Topics: PHP