Recommender Module#

class recwizard.modules.redial.configuration_redial_rec.RedialRecConfig(sa_params=None, autorec_params=None, n_movies=6924, **kwargs)[source]#
__init__(sa_params=None, autorec_params=None, n_movies=6924, **kwargs)[source]#
Parameters:
  • WEIGHT_DIMENSIONS (dict, optional) – The dimension and dtype of module parameters. Used to initialize the parameters when they are not explicitly specified in module initialization. Defaults to None. See also recwizard.module_utils.BaseModule.prepare_weight().

  • **kwargs – Additional parameters. Will be passed to the PretrainedConfig.__init__.

class recwizard.modules.redial.tokenizer_redial_rec.RedialRecTokenizer(id2entity: Dict[int, str] | None = None, sen_encoder='princeton-nlp/unsup-simcse-roberta-base', initiator='User:', respondent='System:', **kwargs)[source]#
__init__(id2entity: Dict[int, str] | None = None, sen_encoder='princeton-nlp/unsup-simcse-roberta-base', initiator='User:', respondent='System:', **kwargs)[source]#
Parameters:
  • entity2id (Dict[str, int]) – a dict mapping entity name to entity id. If not provided, it will be generated from id2entity.

  • id2entity (Dict[int, str]) – a dict mapping entity id to entity name. If not provided, it will be generated from entity2id.

  • pad_entity_id (int) – the id for padding entity. If not provided, it will be the maximum entity id + 1.

  • tokenizers (List[PreTrainedTokenizerBase]) – a list of tokenizers to be used.

  • **kwargs – other arguments for PreTrainedTokenizer

get_init_kwargs()[source]#

The kwargs for initialization. Override this function to declare the necessary initialization kwargs ( they will be saved when the tokenizer is saved or pushed to huggingface model hub.)

See also: save_vocabulary()

classmethod load_from_dataset(dataset='redial', **kwargs)[source]#

Initialize the tokenizer from the dataset. By default, it will load the entity2id from the dataset. :param dataset: the dataset name :param **kwargs: the other arguments for initialization

Returns:

the initialized tokenizer

Return type:

(BaseTokenizer)

preprocess(text: str) Tuple[str, int][source]#

Extract and remove the sender from text :param text: an utterance

Returns: text, sender

encode_plus(text: str, *args, **kwargs) BatchEncoding[source]#

Overrides the encode_plus function from PreTrainedTokenizer to support entity processing.

batch_encode_plus(batch_text_or_text_pairs: List[str], *args, **kwargs) BatchEncoding[source]#
Parameters:
  • batch_text_or_text_pairs

  • *args

  • **kwargs

Returns: BatchEncoding

process_entities(text: str) Tuple[str, List[int], List[str]][source]#

Process the entities in the text. It extracts the entity ids from the text and remove the entity tags.

encodes(encode_funcs: List[Callable], texts: List[str | List[str]], *args, **kwargs) List[BatchEncoding][source]#

This function is called to apply encoding functions from different tokenizers. It will be used by both encode_plus and batch_encode_plus.

If you want to call different tokenizers with different arguments, override this method.

Parameters:
  • encode_funcs – the encoding functions from self.tokenizers.

  • texts – the processed text for each encoding function

  • **kwargs

Returns:

a list of BatchEncoding, the length of the list is the same as the number of tokenizer

decode(raw_input, *args, **kwargs)#

Overrides the decode function from PreTrainedTokenizer. By default, calls the decode function of the first tokenizer.

class recwizard.modules.redial.modeling_redial_rec.RedialRec(config: RedialRecConfig, recommend_new_movies=True, **kwargs)[source]#
__init__(config: RedialRecConfig, recommend_new_movies=True, **kwargs)[source]#
Parameters:

config – config for PreTrainedModel

forward(input_ids, attention_mask, senders, movieIds, conversation_lengths, movie_occurrences, **kwargs)[source]#
Parameters:
  • input_ids – (batch, max_conv_length, max_utt_length)

  • attention_mask – (batch, max_conv_length, max_utt_length)

  • senders – (batch, max_conv_length)

  • movieIds – (batch, max_conv_length, max_n_movies)

  • conversation_lengths – (batch)

  • movie_occurrences – (batch, max_conv_length, max_utterance_length)

  • **kwargs

Returns:

(batch_size, max_conv_length, n_movies_total) movie preferences

response(raw_input, *args, **kwargs)#

The main function for the module to generate a response given an input.

Note

Please refer to our tutorial for implementation guidance: Overview

Parameters:
  • raw_input (str) – the text input

  • tokenizer (PreTrainedTokenizer) – the tokenizer used to tokenize the input

  • return_dict (bool) – if set to True, will return a dict of outputs instead of a single output

  • **kwargs – the keyword arguments that will be passed to forward()

Returns:

By default, a single output will be returned. If return_dict is set to True, a dict of outputs will be returned.