Supporting Modules#

class recwizard.modules.kbrd.tokenizer_nltk.NLTKTokenizer(language='english')[source]#

recwizard.modules.kbrd.tokenizer_nltk.get_tokenizer(name='kbrd')[source]#: Return a tokenizer from the cache.

recwizard.modules.kbrd.tokenizer_nltk.KBRDWordTokenizer(vocab, name='kbrd')[source]#

Return a tokenizer for language models from the given vocabulary :param vocab: list of words :type vocab: List[str] :param name: name of the tokenizer. Used to cache the tokenizer :type name: str

Returns:: PreTrainedTokenizerFast

class recwizard.modules.kbrd.transformer_encoder_decoder.TorchGeneratorModel(padding_idx=0, start_idx=1, end_idx=2, unknown_idx=3, input_dropout=0, longest_label=1)[source]#

This Interface expects you to implement model with the following reqs:

`START`: LongTensor representing the start of sentence

`END`: LongTensor representing the end of sentence

`NULL_IDX`: index of null token

`model.encoder`: takes input returns tuple (enc_out, enc_hidden, attn_mask)

`model.decoder`: takes decoder params and returns decoder outputs after attn

`model.output`: takes decoder outputs and returns distr over dictionary

__init__(padding_idx=0, start_idx=1, end_idx=2, unknown_idx=3, input_dropout=0, longest_label=1)[source]#: Initializes internal Module state, shared by both nn.Module and ScriptModule.

_starts(bsz)[source]#: Return bsz start tokens.

decode_greedy(encoder_states, bsz, maxlen)[source]#

Greedy search

Parameters:

encoder_states – output of the encoder model
bsz – batch size
maxlen – max number of tokens to decode

Returns:

(batch_size, max_len, #vocab), (batch_size, max_len)

Return type:

pair (logits, choices) of the greedy decode, shapes

decode_forced(encoder_states, ys)[source]#

Decode with a fixed, true sequence, computing loss. Useful for training, or ranking fixed candidates.

Parameters:

encoder_states – output of the encoder model, shape: (batch_size, seq_len, hidden_size)
ys – target tokens, shape: (batch_size, tgt_len)

Returns:

pair (logits, choices) containing the logits and MLE predictions,: shapes: (batch_size, tgt_len, #vocab), (batch_size, tgt_len)

reorder_encoder_states(encoder_states, indices)[source]#

Reorder encoder states according to a new set of indices.

This is an abstract method, and must be implemented by the user.

Its purpose is to provide beam search with a model-agnostic interface for beam search. For example, this method is used to sort hypotheses, expand beams, etc.

For example, assume that encoder_states is an bsz x 1 tensor of values

```python

indices = [0, 2, 2] encoder_states = [[0.1]

[0.2] [0.3]]

```

then the output will be

```python

output = [[0.1]
[0.3] [0.3]]

```

Parameters:

encoder_states – output from encoder. type is model specific.
indices (List[int]) – the indices to select over. The user must support non-tensor inputs.

Returns:

The re-ordered encoder states. It should be of the same type as encoder states, and it must be a valid input to the decoder.

reorder_decoder_incremental_state(incremental_state, inds)[source]#

Reorder incremental state for the decoder.

Used to expand selected beams in beam_search. Unlike reorder_encoder_states, implementing this method is optional. However, without incremental decoding, decoding a single beam becomes O(n^2) instead of O(n), which can make beam search impractically slow.

In order to fall back to non-incremental decoding, just return None from this method.

Parameters:

incremental_state – second output of model.decoder
inds (torch.LongTensor) – indices to select and reorder over.

Returns:

The re-ordered decoder incremental states. It should be the same type as incremental_state, and usable as an input to the decoder. This method should return None if the model does not support incremental decoding.

forward(*xs, ys=None, cand_params=None, prev_enc=None, maxlen=None, bsz=None)[source]#

Get output predictions from the model.

Parameters:

xs (torch.LongTensor) – input to the encoder, shapes: (batch_size, seq_len)
ys (torch.LongTensor) – expected output from the decoder, shapes: (batch_size, seq_len)
cand_params (torch.FloatTensor) – parameters for candidate generation, shape: (batch_size, num_cands, num_params)
prev_enc (torch.FloatTensor) – if you know you’ll pass in the same xs multiple times, you can pass in the encoder output from the last forward pass to skip recalcuating the same encoder output.
maxlen (int) – max number of tokens to decode. if not set, will use the length of the longest label this model has seen. ignored when ys is not None.
bsz (int) – if ys is not provided, then you must specify the bsz for greedy decoding.

Returns:

(scores, candidate_scores, encoder_states) tuple

scores contains the model’s predicted token scores.

(FloatTensor[batch_size, seq_len, num_features]) - candidate_scores are the score the model assigned to each candidate. (FloatTensor[batch_size, num_cands]) - encoder_states are the output of model.encoder. Model specific types. Feed this back in to skip encoding on the next call.

class recwizard.modules.kbrd.transformer_encoder_decoder.TransformerEncoder(n_heads, n_layers, embedding_size, ffn_size, vocabulary_size, embedding=None, dropout=0.0, attention_dropout=0.0, relu_dropout=0.0, padding_idx=0, learn_positional_embeddings=False, embeddings_scale=False, reduction=True, n_positions=1024)[source]#

__init__(n_heads, n_layers, embedding_size, ffn_size, vocabulary_size, embedding=None, dropout=0.0, attention_dropout=0.0, relu_dropout=0.0, padding_idx=0, learn_positional_embeddings=False, embeddings_scale=False, reduction=True, n_positions=1024)[source]#

Transformer encoder module.

Parameters:

n_heads (int) – the number of multihead attention heads.
n_layers (int) – number of transformer layers.
embedding_size (int) – the embedding sizes. Must be a multiple of n_heads.
ffn_size (int) – the size of the hidden layer in the FFN
vocabulary_size (int) – size of the vocabulary
embedding (nn.Embedding) – an embedding matrix for the bottom layer of the transformer. If none, one is created for this encoder.
dropout (float) – Dropout used around embeddings and before layer layer normalizations. This is used in Vaswani 2017 and works well on large datasets.
attention_dropout (float) – Dropout performed after the multhead attention softmax. This is not used in Vaswani 2017.
relu_dropout (float) – Dropout used after the ReLU in the FFN. Not used in Vaswani 2017, but used in Tensor2Tensor.
embeddings_scale (bool) – Scale embeddings relative to their dimensionality. Found useful in fairseq.
learn_positional_embeddings (bool) – If off, sinusoidal embeddings are used. If on, position embeddings are learned from scratch.
padding_idx (int) – Reserved padding index in the embeddings matrix.
n_positions (int) – Size of the position embeddings matrix.

forward(input)[source]#: input data is a FloatTensor of shape [batch, seq_len, dim] mask is a ByteTensor of shape [batch, seq_len], filled with 1 when inside the sequence and 0 outside.

class recwizard.modules.kbrd.transformer_encoder_decoder.TransformerEncoderLayer(n_heads, embedding_size, ffn_size, attention_dropout=0.0, relu_dropout=0.0, dropout=0.0)[source]#

__init__(n_heads, embedding_size, ffn_size, attention_dropout=0.0, relu_dropout=0.0, dropout=0.0)[source]#: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(tensor, mask)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class recwizard.modules.kbrd.transformer_encoder_decoder.TransformerDecoder(n_heads, n_layers, embedding_size, ffn_size, vocabulary_size, embedding=None, dropout=0.0, attention_dropout=0.0, relu_dropout=0.0, embeddings_scale=True, learn_positional_embeddings=False, padding_idx=None, n_positions=1024)[source]#

__init__(n_heads, n_layers, embedding_size, ffn_size, vocabulary_size, embedding=None, dropout=0.0, attention_dropout=0.0, relu_dropout=0.0, embeddings_scale=True, learn_positional_embeddings=False, padding_idx=None, n_positions=1024)[source]#

Transformer Decoder layer.

Parameters:

n_heads (int) – the number of multihead attention heads.
n_layers (int) – number of transformer layers.
embedding_size (int) – the embedding sizes. Must be a multiple of n_heads.
ffn_size (int) – the size of the hidden layer in the FFN
vocabulary_size (int) – size of the vocabulary
embedding (nn.Embedding) – an embedding matrix for the bottom layer of the transformer. If none, one is created for this encoder.
dropout (float) – Dropout used around embeddings and before layer layer normalizations. This is used in Vaswani 2017 and works well on large datasets.
attention_dropout (float) – Dropout performed after the multhead attention softmax. This is not used in Vaswani 2017.
relu_dropout (float) – Dropout used after the ReLU in the FFN. Not used in Vaswani 2017, but used in Tensor2Tensor.
embeddings_scale (bool) – Scale embeddings relative to their dimensionality. Found useful in fairseq.
learn_positional_embeddings (bool) – If off, sinusoidal embeddings are used. If on, position embeddings are learned from scratch.
padding_idx (int) – Reserved padding index in the embeddings matrix.
n_positions (int) – Size of the position embeddings matrix.

forward(input, encoder_state, incr_state=None)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

class recwizard.modules.kbrd.transformer_encoder_decoder.TransformerDecoderLayer(n_heads, embedding_size, ffn_size, attention_dropout=0.0, relu_dropout=0.0, dropout=0.0)[source]#

__init__(n_heads, embedding_size, ffn_size, attention_dropout=0.0, relu_dropout=0.0, dropout=0.0)[source]#: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x, encoder_output, encoder_mask)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

class recwizard.modules.kbrd.transformer_encoder_decoder.TransformerGeneratorModel(config, kbrd_rec)[source]#

__init__(config, kbrd_rec)[source]#: Initializes internal Module state, shared by both nn.Module and ScriptModule.

reorder_encoder_states(encoder_states, indices)[source]#

Reorder encoder states according to a new set of indices.

This is an abstract method, and must be implemented by the user.

Its purpose is to provide beam search with a model-agnostic interface for beam search. For example, this method is used to sort hypotheses, expand beams, etc.

For example, assume that encoder_states is an bsz x 1 tensor of values

```python

indices = [0, 2, 2] encoder_states = [[0.1]

[0.2] [0.3]]

```

then the output will be

```python

output = [[0.1]
[0.3] [0.3]]

```

Parameters:

encoder_states – output from encoder. type is model specific.
indices (List[int]) – the indices to select over. The user must support non-tensor inputs.

Returns:

The re-ordered encoder states. It should be of the same type as encoder states, and it must be a valid input to the decoder.

class recwizard.modules.kbrd.transformer_encoder_decoder.BasicAttention(dim=1, attn='cosine')[source]#

__init__(dim=1, attn='cosine')[source]#: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(xs, ys)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

class recwizard.modules.kbrd.transformer_encoder_decoder.MultiHeadAttention(n_heads, dim, dropout=0)[source]#

__init__(n_heads, dim, dropout=0)[source]#: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(query, key=None, value=None, mask=None)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

class recwizard.modules.kbrd.transformer_encoder_decoder.TransformerFFN(dim, dim_hidden, relu_dropout=0)[source]#

__init__(dim, dim_hidden, relu_dropout=0)[source]#: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note