Recommender Module#

class recwizard.modules.kgsf.configuration_kgsf_rec.KGSFRecConfig(dictionary=None, subkg=None, edge_set=None, embedding_data=None, batch_size: int = 32, max_r_length: int = 30, embedding_size: int = 300, n_concept: int = 29308, dim: int = 128, n_entity: int = 64368, num_bases: int = 8, n_positions: int | None = None, truncate: int = 0, text_truncate: int = 0, label_truncate: int = 0, padding_idx: int = 0, start_idx: int = 1, end_idx: int = 2, longest_label: int = 1, pretrain: bool = False, **kwargs)[source]#
class recwizard.modules.kgsf.tokenizer_kgsf_rec.KGSFRecTokenizer(max_count: int = 5, max_c_length: int = 256, max_r_length: int = 30, n_entity: int = 64368, batch_size: int = 1, padding_idx: int = 0, entity2entityId: Dict[str, int] | None = None, word2index: Dict[str, int] | None = None, key2index: Dict[str, int] | None = None, entity_ids: List | None = None, id2name: Dict[str, str] | None = None, id2entity: Dict[int, str] | None = None, entity2id: Dict[str, int] | None = None, **kwargs)[source]#
__init__(max_count: int = 5, max_c_length: int = 256, max_r_length: int = 30, n_entity: int = 64368, batch_size: int = 1, padding_idx: int = 0, entity2entityId: Dict[str, int] | None = None, word2index: Dict[str, int] | None = None, key2index: Dict[str, int] | None = None, entity_ids: List | None = None, id2name: Dict[str, str] | None = None, id2entity: Dict[int, str] | None = None, entity2id: Dict[str, int] | None = None, **kwargs)[source]#
Parameters:
  • entity2id (Dict[str, int]) – a dict mapping entity name to entity id. If not provided, it will be generated from id2entity.

  • id2entity (Dict[int, str]) – a dict mapping entity id to entity name. If not provided, it will be generated from entity2id.

  • pad_entity_id (int) – the id for padding entity. If not provided, it will be the maximum entity id + 1.

  • tokenizers (List[PreTrainedTokenizerBase]) – a list of tokenizers to be used.

  • **kwargs – other arguments for PreTrainedTokenizer

get_init_kwargs()[source]#

The kwargs for initialization. They will be saved when you save the tokenizer or push it to huggingface model hub.

padding_w2v(sentence, max_length, pad=0, end=2, unk=3)[source]#

sentence: [‘Okay’, ‘,’, ‘have’, ‘you’, ‘seen’, @136983’, ‘?’] / […] max_length: 30 / 256

padding_context(contexts, pad=0)[source]#

contexts: eg. [[‘Hello’], [‘hi’, ‘how’, ‘are’, ‘u’], [‘Great’, ‘.’, ‘How’, ‘are’, ‘you’, ‘this’, ‘morning’, ‘?’], [‘would’, ‘u’, ‘have’, ‘any’, ‘recommendations’, ‘for’, ‘me’, ‘im’, ‘good’, ‘thanks’, ‘fo’, ‘asking’], [‘What’, ‘type’, ‘of’, ‘movie’, ‘are’, ‘you’, ‘looking’, ‘for’, ‘?’], [‘comedies’, ‘i’, ‘like’, ‘kristin’, ‘wigg’], [‘Okay’, ‘,’, ‘have’, ‘you’, ‘seen’, @136983’, ‘?’], [‘something’, ‘like’, ‘yes’, ‘have’, ‘watched’, @140066’, ‘?’]]

encode(user_input, user_context=None, entity=None, system_response=None, movie=0)[source]#

user_input: eg. Hi, can you recommend a movie for me? user_context: eg. [[‘Hello’], [‘hi’, ‘how’, ‘are’, ‘u’]] TODO: 考虑分隔符吗 _split_? entity: movies in user_context, default [] system_response: eg. [‘Great’, ‘.’, ‘How’, ‘are’, ‘you’, ‘this’, ‘morning’, ‘?’] movie: movies in system_response, defualt is an ID, so None. ??? TODO: 多个movie的话 case会重复 tokenizer怎么解决?

decode(outputs, top_k=3, labels=None)[source]#

Overrides the decode function from PreTrainedTokenizer. By default, calls the decode function of the first tokenizer.

class recwizard.modules.kgsf.modeling_kgsf_rec.KGSFRec(config, **kwargs)[source]#
__init__(config, **kwargs)[source]#
Parameters:

config – config for PreTrainedModel

forward(response, concept_mask, seed_sets, labels, db_vec, rec, test=True, cand_params=None, prev_enc=None, maxlen=None, bsz=None)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

response(raw_input, *args, **kwargs)#

The main function for the module to generate a response given an input.

Note

Please refer to our tutorial for implementation guidance: Overview

Parameters:
  • raw_input (str) – the text input

  • tokenizer (PreTrainedTokenizer) – the tokenizer used to tokenize the input

  • return_dict (bool) – if set to True, will return a dict of outputs instead of a single output

  • **kwargs – the keyword arguments that will be passed to forward()

Returns:

By default, a single output will be returned. If return_dict is set to True, a dict of outputs will be returned.