2021SC@SDUSC
Code location: similarity/distances.py at master · tensorflow/similarity · GitHub
@tf.keras.utils.register_keras_serializable(package="Similarity") class CosineDistance(Distance): """Compute pairwise cosine distances between embeddings. The [Cosine Distance](https://en.wikipedia.org/wiki/Cosine_similarity) is an angular distance that varies from 0 (similar) to 1 (dissimilar). """ def __init__(self): "Init Cosine distance" super().__init__('cosine') @tf.function def call(self, embeddings: FloatTensor) -> FloatTensor: """Compute pairwise distances for a given batch of embeddings. Args: embeddings: Embeddings to compute the pairwise one. Returns: FloatTensor: Pairwise distance tensor. """ distances = 1 - tf.linalg.matmul( embeddings, embeddings, transpose_b=True) min_clip_distances: FloatTensor = tf.math.maximum(distances, 0.0) return min_clip_distances @tf.keras.utils.register_keras_serializable(package="Similarity") class EuclideanDistance(Distance): """Compute pairwise euclidean distances between embeddings. The [Euclidean Distance](https://en.wikipedia.org/wiki/Euclidean_distance) is the standard distance to measure the line segment between two embeddings in the Cartesian point. The larger the distance the more dissimilar the embeddings are. **Alias**: L2 Norm, Pythagorean """ def __init__(self): "Init Euclidean distance" super().__init__('euclidean', ['l2', 'pythagorean']) @tf.function def call(self, embeddings: FloatTensor) -> FloatTensor: """Compute pairwise distances for a given batch of embeddings. Args: embeddings: Embeddings to compute the pairwise one. Returns: FloatTensor: Pairwise distance tensor. """ squared_norm = tf.math.square(embeddings) squared_norm = tf.math.reduce_sum(squared_norm, axis=1, keepdims=True) distances: FloatTensor = 2.0 * tf.linalg.matmul( embeddings, embeddings, transpose_b=True) distances = squared_norm - distances + tf.transpose(squared_norm) # Avoid NaN and inf gradients when back propagating through the sqrt. # values smaller than 1e-18 produce inf for the gradient, and 0.0 # produces NaN. All values smaller than 1e-13 should produce a gradient # of 1.0. dist_mask = tf.math.greater_equal(distances, 1e-18) distances = tf.math.maximum(distances, 1e-18) distances = tf.math.sqrt(distances) * tf.cast(dist_mask, tf.float32) return distances
Loss/measure function with multiple parameters
You may have noticed that the loss function must only accept two parameters:y_true and y_pred, which is the target tensor and the model output tensor, respectively. But what if we want our loss/measure to depend on two other tensors?
To do this, we need to use function closures. We'll create a loss function (with any parameter we like) that returns functions of y_true and y_pred.
For example, if we want (for some reason) to create a loss function, add the mean square of all activations in the first layer to the MSE:
# Build a model inputs = Input(shape=(128,)) layer1 = Dense(64, activation='relu')(inputs) layer2 = Dense(64, activation='relu')(layer1) predictions = Dense(10, activation='softmax')(layer2) model = Model(inputs=inputs, outputs=predictions) # Define custom loss def custom_loss(layer): # Create a loss function that adds the MSE loss to the mean of all squared activations of a specific layer def loss(y_true,y_pred): return K.mean(K.square(y_pred - y_true) + K.square(layer), axis=-1) # Return a function return loss # Compile the model model.compile(optimizer='adam', loss=custom_loss(layer), # Call the loss function with the selected layer metrics=['accuracy']) # train model.fit(data, labels)
Note that we have created a function that returns a legal loss function (without restricting the number of parameters) that can access the parameters of its enclosing function.
A more specific example:
The previous example is a toy example for a less useful use case. So when do we want to use such a loss function?
Suppose you are designing a variational autocoder. You want your model to be able to rebuild its input from the potential space of the code. However, you also want the codes in the potential space to be (approximately) normally distributed.
Although the previous goal can be designed to depend solely on your input and desired output y_true and y_pred's rebuild loss is achieved. For the latter, you need to design a loss item (such as the Kullback Leibler loss) that operates on a potential tensor. In order for your loss function to access this intermediate tensor, the techniques we just learned can be useful.
Example use:
def model_loss(self): """" Wrapper function which calculates auxiliary values for the complete loss function. Returns a *function* which calculates the complete loss given only the input and target output """ # KL loss kl_loss = self.calculate_kl_loss # Reconstruction loss md_loss_func = self.calculate_md_loss # KL weight (to be used by total loss and by annealing scheduler) self.kl_weight = K.variable(self.hps['kl_weight_start'], name='kl_weight') kl_weight = self.kl_weight def seq2seq_loss(y_true, y_pred): """ Final loss calculation function to be passed to optimizer""" # Reconstruction loss md_loss = md_loss_func(y_true, y_pred) # Full loss model_loss = kl_weight*kl_loss() + md_loss return model_loss return seq2seq_loss
This example is part of the Sequence to Sequence Variation Auto Encoder model. For more context and complete code, visit the Keras implementation of this repo-Sketch-RNN algorithm.
As mentioned earlier, although the example is for a loss function, creating a custom measure function works the same way.