TensorflowSimilarity Learning Notes 7

Posted by abid786 on Thu, 30 Dec 2021 20:44:41 +0100


Code location: similarity/distances.py at master · tensorflow/similarity · GitHub 

class CosineDistance(Distance):
    """Compute pairwise cosine distances between embeddings.
    The [Cosine Distance](https://en.wikipedia.org/wiki/Cosine_similarity) is
    an angular distance that varies from 0 (similar) to 1 (dissimilar).
    def __init__(self):
        "Init Cosine distance"

    def call(self, embeddings: FloatTensor) -> FloatTensor:
        """Compute pairwise distances for a given batch of embeddings.
            embeddings: Embeddings to compute the pairwise one.
            FloatTensor: Pairwise distance tensor.
        distances = 1 - tf.linalg.matmul(
                embeddings, embeddings, transpose_b=True)
        min_clip_distances: FloatTensor = tf.math.maximum(distances, 0.0)
        return min_clip_distances

class EuclideanDistance(Distance):
    """Compute pairwise euclidean distances between embeddings.
    The [Euclidean Distance](https://en.wikipedia.org/wiki/Euclidean_distance)
    is the standard distance to measure the line segment between two embeddings
    in the Cartesian point. The larger the distance the more dissimilar
    the embeddings are.
    **Alias**: L2 Norm, Pythagorean
    def __init__(self):
        "Init Euclidean distance"
        super().__init__('euclidean', ['l2', 'pythagorean'])

    def call(self, embeddings: FloatTensor) -> FloatTensor:
        """Compute pairwise distances for a given batch of embeddings.
            embeddings: Embeddings to compute the pairwise one.
            FloatTensor: Pairwise distance tensor.
        squared_norm = tf.math.square(embeddings)
        squared_norm = tf.math.reduce_sum(squared_norm, axis=1, keepdims=True)

        distances: FloatTensor = 2.0 * tf.linalg.matmul(
            embeddings, embeddings, transpose_b=True)
        distances = squared_norm - distances + tf.transpose(squared_norm)

        # Avoid NaN and inf gradients when back propagating through the sqrt.
        # values smaller than 1e-18 produce inf for the gradient, and 0.0
        # produces NaN. All values smaller than 1e-13 should produce a gradient
        # of 1.0.
        dist_mask = tf.math.greater_equal(distances, 1e-18)
        distances = tf.math.maximum(distances, 1e-18)
        distances = tf.math.sqrt(distances) * tf.cast(dist_mask, tf.float32)

        return distances

Loss/measure function with multiple parameters
You may have noticed that the loss function must only accept two parameters:y_true and y_pred, which is the target tensor and the model output tensor, respectively. But what if we want our loss/measure to depend on two other tensors?
To do this, we need to use function closures. We'll create a loss function (with any parameter we like) that returns functions of y_true and y_pred.
For example, if we want (for some reason) to create a loss function, add the mean square of all activations in the first layer to the MSE:

# Build a model
inputs = Input(shape=(128,))
layer1 = Dense(64, activation='relu')(inputs)
layer2 = Dense(64, activation='relu')(layer1)
predictions = Dense(10, activation='softmax')(layer2)
model = Model(inputs=inputs, outputs=predictions)

# Define custom loss
def custom_loss(layer):

    # Create a loss function that adds the MSE loss to the mean of all squared activations of a specific layer
    def loss(y_true,y_pred):
        return K.mean(K.square(y_pred - y_true) + K.square(layer), axis=-1)
    # Return a function
    return loss
# Compile the model
              loss=custom_loss(layer), # Call the loss function with the selected layer

# train
model.fit(data, labels)  

Note that we have created a function that returns a legal loss function (without restricting the number of parameters) that can access the parameters of its enclosing function.
A more specific example:
The previous example is a toy example for a less useful use case. So when do we want to use such a loss function?
Suppose you are designing a variational autocoder. You want your model to be able to rebuild its input from the potential space of the code. However, you also want the codes in the potential space to be (approximately) normally distributed.

Although the previous goal can be designed to depend solely on your input and desired output y_true and y_pred's rebuild loss is achieved. For the latter, you need to design a loss item (such as the Kullback Leibler loss) that operates on a potential tensor. In order for your loss function to access this intermediate tensor, the techniques we just learned can be useful.
Example use:

   def model_loss(self):
        """" Wrapper function which calculates auxiliary values for the complete loss function.
         Returns a *function* which calculates the complete loss given only the input and target output """
        # KL loss
        kl_loss = self.calculate_kl_loss
        # Reconstruction loss
        md_loss_func = self.calculate_md_loss

        # KL weight (to be used by total loss and by annealing scheduler)
        self.kl_weight = K.variable(self.hps['kl_weight_start'], name='kl_weight')
        kl_weight = self.kl_weight

        def seq2seq_loss(y_true, y_pred):
            """ Final loss calculation function to be passed to optimizer"""
            # Reconstruction loss
            md_loss = md_loss_func(y_true, y_pred)
            # Full loss
            model_loss = kl_weight*kl_loss() + md_loss
            return model_loss

        return seq2seq_loss

This example is part of the Sequence to Sequence Variation Auto Encoder model. For more context and complete code, visit the Keras implementation of this repo-Sketch-RNN algorithm.
As mentioned earlier, although the example is for a loss function, creating a custom measure function works the same way.

Topics: AI TensorFlow keras