Generator Module#
- class recwizard.modules.redial.configuration_redial_gen.RedialGenConfig(hrnn_params=None, decoder_params=None, vocab_size=15005, n_movies=6924, **kwargs)[source]#
- __init__(hrnn_params=None, decoder_params=None, vocab_size=15005, n_movies=6924, **kwargs)[source]#
- Parameters:
WEIGHT_DIMENSIONS (dict, optional) – The dimension and dtype of module parameters. Used to initialize the parameters when they are not explicitly specified in module initialization. Defaults to None. See also
recwizard.module_utils.BaseModule.prepare_weight()
.**kwargs – Additional parameters. Will be passed to the PretrainedConfig.__init__.
- class recwizard.modules.redial.tokenizer_redial_gen.RedialGenTokenizer(vocab: List[str], id2entity: Dict[int, str] | None = None, sen_encoder='princeton-nlp/unsup-simcse-roberta-base', initiator='User:', respondent='System:', **kwargs)[source]#
- __init__(vocab: List[str], id2entity: Dict[int, str] | None = None, sen_encoder='princeton-nlp/unsup-simcse-roberta-base', initiator='User:', respondent='System:', **kwargs)[source]#
- Parameters:
entity2id (Dict[str, int]) – a dict mapping entity name to entity id. If not provided, it will be generated from id2entity.
id2entity (Dict[int, str]) – a dict mapping entity id to entity name. If not provided, it will be generated from entity2id.
pad_entity_id (int) – the id for padding entity. If not provided, it will be the maximum entity id + 1.
tokenizers (List[PreTrainedTokenizerBase]) – a list of tokenizers to be used.
**kwargs – other arguments for PreTrainedTokenizer
- get_init_kwargs()[source]#
The kwargs for initialization. Override this function to declare the necessary initialization kwargs ( they will be saved when the tokenizer is saved or pushed to huggingface model hub.)
See also:
save_vocabulary()
- classmethod load_from_dataset(dataset='redial', **kwargs)[source]#
Initialize the tokenizer from the dataset. By default, it will load the entity2id from the dataset. :param dataset: the dataset name :param **kwargs: the other arguments for initialization
- Returns:
the initialized tokenizer
- Return type:
- preprocess(text: str) Tuple[str, int] [source]#
Extract and remove the sender from text :param text: an utterance
Returns: text, sender
- encode_plus(text: str, *args, **kwargs) BatchEncoding [source]#
This function encodes one dialog consisting of mutiple utterances.
- batch_encode_plus(batch_text_or_text_pairs: List[str], *args, **kwargs) BatchEncoding [source]#
- Parameters:
batch_text_or_text_pairs –
*args –
**kwargs –
Returns: BatchEncoding
- process_entities(text: str) Tuple[str, List[int], List[str]] [source]#
Process the entities in the text. It extracts the entity ids from the text and remove the entity tags.
- encodes(encode_funcs: List[Callable], texts: List[str | List[str]], *args, **kwargs) List[BatchEncoding] [source]#
This function is called to apply encoding functions from different tokenizers. It will be used by both encode_plus and batch_encode_plus.
If you want to call different tokenizers with different arguments, override this method.
- Parameters:
encode_funcs – the encoding functions from self.tokenizers.
texts – the processed text for each encoding function
**kwargs –
- Returns:
a list of BatchEncoding, the length of the list is the same as the number of tokenizer
- tokenize(text, **kwargs) List[str] [source]#
Converts a string in a sequence of tokens, using the tokenizer.
Split in words for word-based vocabulary or sub-words for sub-word-based vocabularies (BPE/SentencePieces/WordPieces). Takes care of added tokens.
- Parameters:
text (str) – The sequence to be encoded.
**kwargs (additional keyword arguments) – Passed along to the model-specific prepare_for_tokenization preprocessing method.
- Returns:
The list of tokens.
- Return type:
List[str]
- property vocab_size: int#
Size of the base vocabulary (without the added tokens).
- Type:
int
- class recwizard.modules.redial.modeling_redial_gen.RedialGen(config: RedialGenConfig, word_embedding=None, **kwargs)[source]#
Conditioned GRU. The context vector is used as an initial hidden state at each layer of the GRU
- __init__(config: RedialGenConfig, word_embedding=None, **kwargs)[source]#
- Parameters:
config – config for PreTrainedModel
- forward(dialogue, lengths, recs, context=None, state=None, encode_only=False, **hrnn_input)[source]#
- Parameters:
dialogue – (batch_size, seq_len)
lengths – (batch_size)
recs – (batch_size, n_movies)
context –
state –
encode_only –
**hrnn_input –
Returns:
- response(**kwargs)#
The main function for the module to generate a response given an input.
Note
Please refer to our tutorial for implementation guidance: Overview
- Parameters:
raw_input (str) – the text input
tokenizer (PreTrainedTokenizer) – the tokenizer used to tokenize the input
return_dict (bool) – if set to True, will return a dict of outputs instead of a single output
**kwargs – the keyword arguments that will be passed to
forward()
- Returns:
By default, a single output will be returned. If return_dict is set to True, a dict of outputs will be returned.