easy_vision.python.core.transformer¶
easy_vision.python.core.transformer.attention_layer¶
Implementation of multiheaded attention and self-attention layers.
-
class
easy_vision.python.core.transformer.attention_layer.
Attention
(hidden_size, num_heads, attention_dropout, train)[source]¶ Bases:
tensorflow.python.layers.base.Layer
Multi-headed attention layer.
-
call
(x, y, bias, cache=None)[source]¶ Apply attention mechanism to x and y.
Parameters: - x – a tensor with shape [batch_size, length_x, hidden_size]
- y – a tensor with shape [batch_size, length_y, hidden_size]
- bias – attention bias that will be added to the result of the dot product.
- cache –
(Used during prediction) dictionary with tensors containing results of previous attentions. The dictionary must have the items:
- {“k”: tensor with shape [batch_size, i, key_channels],
- ”v”: tensor with shape [batch_size, i, value_channels]}
where i is the current decoded length.
Returns: Attention layer output with shape [batch_size, length_x, hidden_size]
-
combine_heads
(x)[source]¶ Combine tensor that has been split.
Parameters: x – A tensor [batch_size, num_heads, length, hidden_size/num_heads] Returns: A tensor with shape [batch_size, length, hidden_size]
-
split_heads
(x)[source]¶ Split x into different heads, and transpose the resulting value.
The tensor is transposed to insure the inner dimensions hold the correct values during the matrix multiplication.
Parameters: x – A tensor with shape [batch_size, length, hidden_size] Returns: A tensor with shape [batch_size, num_heads, length, hidden_size/num_heads]
-
-
class
easy_vision.python.core.transformer.attention_layer.
SelfAttention
(hidden_size, num_heads, attention_dropout, train)[source]¶ Bases:
easy_vision.python.core.transformer.attention_layer.Attention
Multiheaded self-attention layer.
-
call
(x, bias, cache=None)[source]¶ Apply attention mechanism to x and y.
Parameters: - x – a tensor with shape [batch_size, length_x, hidden_size]
- y – a tensor with shape [batch_size, length_y, hidden_size]
- bias – attention bias that will be added to the result of the dot product.
- cache –
(Used during prediction) dictionary with tensors containing results of previous attentions. The dictionary must have the items:
- {“k”: tensor with shape [batch_size, i, key_channels],
- ”v”: tensor with shape [batch_size, i, value_channels]}
where i is the current decoded length.
Returns: Attention layer output with shape [batch_size, length_x, hidden_size]
-
easy_vision.python.core.transformer.beam_search¶
Beam search to find the translated sequence with the highest probability.
Source implementation from Tensor2Tensor: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/beam_search.py
-
class
easy_vision.python.core.transformer.beam_search.
SequenceBeamSearch
(symbols_to_logits_fn, vocab_size, batch_size, beam_size, alpha, max_decode_length, eos_id)[source]¶ Bases:
object
Implementation of beam search loop.
-
__init__
(symbols_to_logits_fn, vocab_size, batch_size, beam_size, alpha, max_decode_length, eos_id)[source]¶ Parameters: - symbols_to_logits_fn – a decoding function that calculates logits of the next tokens
- vocab_size – size of vocabulary dict
- batch_size – batch size
- beam_size – beam search width
- alpha – length penalty for beam search
- max_decode_length – max decode steps
- eos_id – end of sequence od
-
-
easy_vision.python.core.transformer.beam_search.
sequence_beam_search
(symbols_to_logits_fn, initial_ids, initial_cache, vocab_size, beam_size, alpha, max_decode_length, eos_id)[source]¶ Search for sequence of subtoken ids with the largest probability.
Parameters: - symbols_to_logits_fn –
A function that takes in ids, index, and cache as arguments. The passed in arguments will have shape:
ids -> [batch_size * beam_size, index] index -> [] (scalar) cache -> nested dictionary of tensors [batch_size * beam_size, …]- The function must return logits and new cache.
- logits -> [batch * beam_size, vocab_size] new cache -> same shape/structure as inputted cache
- initial_ids – Starting ids for each batch item. int32 tensor with shape [batch_size]
- initial_cache – dict containing starting decoder variables information
- vocab_size – int size of tokens
- beam_size – int number of beams
- alpha – float defining the strength of length normalization
- max_decode_length – maximum length to decoded sequence
- eos_id – int id of eos token, used to determine when a sequence has finished
Returns: Top decoded sequences [batch_size, beam_size, max_decode_length] sequence scores [batch_size, beam_size]
- symbols_to_logits_fn –
easy_vision.python.core.transformer.common¶
easy_vision.python.core.transformer.ffn_layer¶
Implementation of fully connected network.
easy_vision.python.core.transformer.transformer_utils¶
Transformer model helper methods.
-
easy_vision.python.core.transformer.transformer_utils.
get_decoder_self_attention_bias
(length)[source]¶ Calculate bias for decoder that maintains model’s autoregressive property.
Creates a tensor that masks out locations that correspond to illegal connections, so prediction at position i cannot draw information from future positions.
Parameters: length – int length of sequences in batch. Returns: float tensor of shape [1, 1, length, length]
-
easy_vision.python.core.transformer.transformer_utils.
get_padding
(sequence_length, dtype=tf.float32)[source]¶ Return float tensor representing the padding values in x.
Parameters: - sequence_length – input sequence length with shape [batch_size]
- dtype – type of the output
Returns: - float tensor with same shape as x containing values 0 or 1.
0 -> non-padding, 1 -> padding
-
easy_vision.python.core.transformer.transformer_utils.
get_padding_bias
(sequence_length, res_rank=4)[source]¶ Calculate bias tensor from padding values in tensor.
Bias tensor that is added to the pre-softmax multi-headed attention logits, which has shape [batch_size, num_heads, length, length]. The tensor is zero at non-padding locations, and -1e9 (negative infinity) at padding locations.
Parameters: - sequence_length – input sequence length with shape [batch_size]
- res_rank – int indicates the rank of attention_bias.
Returns: Attention bias tensor of shape [batch_size, 1, 1, length] if res_rank = 4 - for Transformer or [batch_size, 1, length] if res_rank = 3 - for ConvS2S
-
easy_vision.python.core.transformer.transformer_utils.
get_position_encoding
(length, hidden_size, min_timescale=1.0, max_timescale=10000.0)[source]¶ Return positional encoding.
Calculates the position encoding as a mix of sine and cosine functions with geometrically increasing wavelengths. Defined and formulized in Attention is All You Need, section 3.5.
Parameters: - length – Sequence length.
- hidden_size – Size of the
- min_timescale – Minimum scale that will be applied at each position
- max_timescale – Maximum scale that will be applied at each position
Returns: Tensor with shape [length, hidden_size]