dlk.utils package

Submodules

dlk.utils.config module

Provide BaseConfig which provide the basic method for configs, and ConfigTool a general config(dict) process tool

class dlk.utils.config.BaseConfig(config: Dict)[source]

Bases: object

BaseConfig provide the basic function for all config

post_check(config, used=None)[source]

check all the paras in config is used

Parameters
  • config – paras

  • used – used paras

Returns

None

Raises

logger.warning("Unused")

class dlk.utils.config.ConfigTool[source]

Bases: object

This Class is not be used as much as I design.

static do_update_config(config: dict, update_config: Optional[dict] = None) Dict[source]

use the update_config dict update the config dict, recursively

see ConfigTool._inplace_update_dict

Parameters
  • config – will be updated dict

  • update_confg – config: use _new update _base

Returns

updated_config

static get_config_by_stage(stage: str, config: Dict) Dict[source]

get the stage_config for special stage in provide config

it means the config of this stage equals to config[stage] return config[config[stage]]

Config Example:
>>> config = {
>>>     "train":{ //train、predict、online stage config,  using '&' split all stages
>>>         "data_pair": {
>>>             "label": "label_id"
>>>         },
>>>         "data_set": {                   // for different stage, this processor will process different part of data
>>>             "train": ['train', 'dev'],
>>>             "predict": ['predict'],
>>>             "online": ['online']
>>>         },
>>>         "vocab": "label_vocab", // usually provided by the "token_gather" module
>>>     },
>>>     "predict": "train",
>>>     "online": ["train",
>>>     {"vocab": "new_label_vocab"}
>>>     ]
>>> }
>>> config.get_config['predict'] == config['predict'] == config['train']
Parameters
  • stage – the stage, like ‘train’, ‘predict’, etc.

  • config – the base config which has different stage config

Returns

stage_config

static get_leaf_module(module_register: dlk.utils.register.Register, module_config_register: dlk.utils.register.Register, module_name: str, config: Dict) Tuple[Any, object][source]

get the module from module_register and module_config from module_config_register which name=module_name

Parameters
  • module_register – register for module which has ‘module_name’

  • module_config_register – config register for config which has ‘module_name’

  • module_name – the module name which we want to get from register

Returns

module(which name is module_name), module_config(which name is module_name)

dlk.utils.get_root module

Get the dlk package root path

dlk.utils.get_root.get_root()[source]

get the dlk root

Returns

abspath of this package

dlk.utils.logger module

class dlk.utils.logger.Logger(log_file: str = '', base_dir: str = 'logs', log_level: str = 'debug', log_name='dlk')[source]

Bases: object

docstring for logger

static get_logger() loguru._logger.Logger[source]

return the ‘dlk’ logger if initialized otherwise init and return it

Returns

Logger.global_logger

global_log_file: set[str] = {}
global_logger: loguru._logger.Logger = <loguru.logger handlers=[(id=1, level=10, sink=<stdout>)]>
static init_file_logger(log_file, base_dir='logs', log_level: str = 'debug')[source]

init(if there is not one) or change(if there already is one) the log file

Parameters
  • log_file – log file path

  • base_dir – real log path is ‘$base_dir/$log_file’

  • log_level – ‘debug’, ‘info’, etc.

Returns

None

static init_global_logger(log_level: str = 'debug', log_name: Optional[str] = None, reinit: bool = False)[source]

init the global_logger

Parameters
  • log_level – you can change this to logger to different level

  • log_name – change this is not suggested

  • reinit – if set true, will force reinit

Returns

None

level_map = {'debug': 'DEBUG', 'error': 'ERROR', 'info': 'INFO', 'warning': 'WARNING'}
log_name: str = 'dlk'
warning_file = False

dlk.utils.parser module

class dlk.utils.parser.BaseConfigParser(config_file: Union[str, Dict, List], config_base_dir: str = '', register: Optional[dlk.utils.register.Register] = None)[source]

Bases: object

The config parser order is: inherit -> search -> link

If some config is marked to “@”, this means the para has not default value, you must coverd it(like ‘label_nums’, etc.).

static check_config(configs: Union[Dict, List[Dict]]) None[source]

check all config is valid.

check all “@” is replaced to correct value. :param configs: TODO

Returns

None

Raises

ValueError

collect move all links in config to top

only do in the top level of config, collect all level links and return the links with level

Parameters
  • config

    >>> {
    >>>     "arg1": {
    >>>         "arg11": 2
    >>>         "arg12": 3
    >>>         "_link": {"arg11": "arg12"}
    >>>     }
    >>> }
    

  • all_level_links – TODO

  • level – TODO

Returns

>>> {
>>>     "arg1": {
>>>         "arg11": 2
>>>         "arg12": 3
>>>     }
>>>     "_link": {"arg1.arg11": "arg1.arg12"}
>>> }

inplace link the config[to] = config[source]

Parameters
  • link – {link-from:link-to-1, link-from:[link-to-2, link-to-3]}

  • config – will linked base config

Returns

None

flat all the _search paras to list

support recursive parser _search now, this means you can add _search/_link/_base paras in _search paras but you should only search currently level paras

Parameters
  • search – search paras, {“para1”: [1,2,3], ‘para2’: ‘list(range(10))’}

  • config – base config

Returns: list of possible config

classmethod get_base_config(config_name: str) Dict[source]

get the base config use the config_name

Parameters

config_name – the config name

Returns

config of the config_name

get_cartesian_prod(list_of_list_of_dict: List[List[Dict]]) List[List[Dict]][source]

get catesian prod from two lists

Parameters

list_of_list_of_dict – [[config_a1, config_a2], [config_b1, config_b2]]

Returns

[[config_a1, config_b1], [config_a1, config_b2], [config_a2, config_b1], [config_a2, config_b2]]

get_kind_module_base_config(abstract_config: Union[dict, str], kind_module: str = '') List[dict][source]

get the whole config of ‘kind_module’ by given abstract_config

Parameters
  • abstract_config – will expanded config

  • kind_module – the module kind, like ‘embedding’, ‘subprocessor’, which registed in config_parser_register

Returns: parserd config (whole config) of abstract_config

static get_named_list_cartesian_prod(dict_of_list: Optional[Dict[str, List]] = None) List[Dict][source]

get catesian prod from named lists

Parameters

dict_of_list – {‘name1’: [1,2,3], ‘name2’: “list(range(1, 4))”}

Returns

1, ‘name2’: 1}, {‘name1’: 1, ‘name2’: 2}, {‘name1’: 1, ‘name2’: 3}, …]

Return type

[{‘name1’

is_rep_config(list_of_dict: List[dict]) bool[source]

check is there a repeat config in list

Parameters

list_of_dict – a list of dict

Returns

has repeat or not

load_hjson_file(file_path: str) Dict[source]

load hjson file from file_path and return a Dict

Parameters

file_path – the file path

Returns: loaded dict

map_to_submodule(config: dict, map_fun: Callable) Dict[source]

map the map_fun to all submodules in config

use the map_fun to process all the modules

Parameters
  • config – a dict of submodules, the key is the module kind wich registed in config_parser_register

  • map_fun – use the map_fun process the submodule

Returns: TODO

parser(parser_link=True) List[source]

parser the config

Parameters

parser_link – whether parser the links

Returns: all valided configs

parser_with_check(parser_link=True) List[Dict][source]

parser the config and check the config is valid

Parameters

parser_link – whether parser the links

Returns: all valided configs

class dlk.utils.parser.CallbackConfigParser(config_file)[source]

Bases: dlk.utils.parser.BaseConfigParser

docstring for CallbackConfigParser

class dlk.utils.parser.ConfigConfigParser(config_file)[source]

Bases: dlk.utils.parser.BaseConfigParser

parser(parser_link=True)[source]

parser the config

config support _search and _link

Parameters

parser_link – whether parser the links

Returns

all valided configs

class dlk.utils.parser.DatamoduleConfigParser(config_file)[source]

Bases: dlk.utils.parser.BaseConfigParser

docstring for DatamoduleConfigParser

class dlk.utils.parser.DecoderConfigParser(config_file)[source]

Bases: dlk.utils.parser.BaseConfigParser

docstring for DecoderConfigParser

class dlk.utils.parser.EmbeddingConfigParser(config_file)[source]

Bases: dlk.utils.parser.BaseConfigParser

docstring for EmbeddingConfigParser

class dlk.utils.parser.EncoderConfigParser(config_file)[source]

Bases: dlk.utils.parser.BaseConfigParser

docstring for EncoderConfigParser

class dlk.utils.parser.IModelConfigParser(config_file)[source]

Bases: dlk.utils.parser.BaseConfigParser

docstring for IModelConfigParser

class dlk.utils.parser.InitMethodConfigParser(config_file)[source]

Bases: dlk.utils.parser.BaseConfigParser

docstring for InitMethodConfigParser

class dlk.utils.parser.LinkConfigParser(config_file)[source]

Bases: object

parser(parser_link=False)[source]

parser the config

config support _search and _link

Parameters

parser_link – must be false

Returns

all valided configs

class dlk.utils.parser.LinkUnionTool[source]

Bases: object

Assisting tool for parsering the “_link” of config. All the function named the top level has high priority than low level

This class is mostly for resolve the confilicts of the low and high level register links.

find(key: str)[source]

find the root of the key

Parameters

key – a token

Returns

the root of the key

get the registed links

Returns

all registed and validation links

low_level_union(link_from: str, link_to: str)[source]

union the low level link_from->link_to pair

On the basis of the high-level links, this function is used to regist low-level link If link-from and link-to were all not appeared at before, they will be directly registed. If only one of the link-from and link-to appeared, the value of the link-from and link-to will be overwritten by the corresponding value of the upper level, If both link-from and link-to appeared at before, and if they linked the same value, we will do nothing, otherwise RAISE AN ERROR

Parameters
  • link_from – the link-from key

  • link_to – the link-to key

Returns

None

register the low level links, low level means the base(parant) level config

Parameters

links – {“link-from”: [“list of link-to”], “link-from2”: “link-to2”}

Returns

self

register the top level links, top level means the link_to level config

Parameters

links – {“from”: [“tolist”], “from2”: “to2”}

Returns

self

top_level_union(link_from: str, link_to: str)[source]

union the top level link_from->link_to pair

Register the ‘link’(link-from -> link-to) in the same(top) level config should be merged using top_level_union Parameters are not allowed to be assigned repeatedly (the same parameter cannot appear more than once in the link-to position, otherwise it will cause ambiguity.)

Parameters
  • link_from – the link-from key

  • link_to – the link-to key

Returns

None

class dlk.utils.parser.LossConfigParser(config_file)[source]

Bases: dlk.utils.parser.BaseConfigParser

docstring for LossConfigParser

class dlk.utils.parser.ManagerConfigParser(config_file)[source]

Bases: dlk.utils.parser.BaseConfigParser

docstring for ManagerConfigParser

class dlk.utils.parser.ModelConfigParser(config_file)[source]

Bases: dlk.utils.parser.BaseConfigParser

docstring for ModelConfigParser

class dlk.utils.parser.ModuleConfigParser(config_file)[source]

Bases: dlk.utils.parser.BaseConfigParser

docstring for ModuleConfigParser

class dlk.utils.parser.OptimizerConfigParser(config_file)[source]

Bases: dlk.utils.parser.BaseConfigParser

docstring for OptimizerConfigParser

class dlk.utils.parser.PostProcessorConfigParser(config_file)[source]

Bases: dlk.utils.parser.BaseConfigParser

docstring for PostProcessorConfigParser

class dlk.utils.parser.ProcessorConfigParser(config_file)[source]

Bases: dlk.utils.parser.BaseConfigParser

docstring for ProcessorConfigParser

class dlk.utils.parser.RootConfigParser(config_file)[source]

Bases: dlk.utils.parser.BaseConfigParser

docstring for RootConfigParser

class dlk.utils.parser.ScheduleConfigParser(config_file)[source]

Bases: dlk.utils.parser.BaseConfigParser

docstring for ScheduleConfigParser

class dlk.utils.parser.SubProcessorConfigParser(config_file)[source]

Bases: dlk.utils.parser.BaseConfigParser

docstring for SubProcessorConfigParser

class dlk.utils.parser.TaskConfigParser(config_file)[source]

Bases: dlk.utils.parser.BaseConfigParser

docstring for TaskConfigParser

dlk.utils.register module

class dlk.utils.register.Register(register_name: str)[source]

Bases: object

get(name: str = '') Any[source]

get the module by name

Parameters

name – the name should be the real name or name+@+sub_name, and the

Returns

registed module

register(name: str = '') Callable[source]

register the name: module to self.registry

Parameters

name – the registed module name

Returns

the module

dlk.utils.tokenizer_util module

class dlk.utils.tokenizer_util.PreTokenizerFactory(tokenizer: tokenizers.Tokenizer)[source]

Bases: object

property bert

bert pre_tokenizer

Returns

BertPreTokenizer

property bytelevel

byte level pre_tokenizer

Returns

ByteLevel

get(name)[source]

get pretokenizer by name

Returns

postprocess

property whitespace

whitespace pre_tokenizer

Returns

Whitespace

property whitespacesplit

whitespacesplit pre_tokenizer

Returns

WhitespaceSplit

class dlk.utils.tokenizer_util.TokenizerNormalizerFactory(tokenizer: tokenizers.Tokenizer)[source]

Bases: object

get(name)[source]

get normalizers by name

Returns

Normalizer

property lowercase

do lowercase normalizers

Returns

Lowercase

property nfc

do nfc normalizers

Returns

NFC

property nfd

do nfd normalizers

Returns

NFD

property strip

do strip normalizers

Returns

StripAccents

property strip_accents

do strip normalizers

Returns

StripAccents

class dlk.utils.tokenizer_util.TokenizerPostprocessorFactory(tokenizer: tokenizers.Tokenizer)[source]

Bases: object

docstring for TokenizerPostprocessorFactory

property bert

bert postprocess

Returns

bert postprocess

get(name)[source]

get postprocess by name

Returns

postprocess

dlk.utils.vocab module

class dlk.utils.vocab.Vocabulary(do_strip: bool = False, unknown: str = '', ignore: str = '', pad: str = '')[source]

Bases: object

generate vocab from tokens(token or Iterable tokens) you can dumps the object to dict and load from dict

add(word)[source]

add one word to vocab

Parameters

word – single word

Returns

self

add_from_iter(iterator)[source]

add the tokens in iterator to vocab

Parameters

iterator – List[str] | Set[str] | List[List[str]]

Returns

self

auto_get_index(data: Union[str, List])[source]

get the index of word ∈data from this vocab

Parameters

data – auto detection

Returns

type the same as data

auto_update(data: Union[str, Iterable])[source]

auto detect data type to update the vocab

Parameters

data – str| List[str] | Set[str] | List[List[str]]

Returns

self

dumps() Dict[source]

dumps the object to dict

Returns

self.__dict__

filter_rare(min_freq=1, most_common=- 1)[source]

filter the words which count is to small.

min_freq and most_common can not set all

Parameters
  • min_freq – minist frequency

  • most_common – most common number, -1 means all

Returns

None

get_index(word: str) int[source]

get the index of word from this vocab

Parameters

word – a single token

Returns

index

get_word(index: int) str[source]

get the word by index

Parameters

index – word index

Returns

word

classmethod load(attr: Dict)[source]

load the object from dict

Parameters

attr – self.__dict__

Returns

initialized Vocabulary

Module contents