dlk.utils package
Submodules
dlk.utils.config module
Provide BaseConfig which provide the basic method for configs, and ConfigTool a general config(dict) process tool
- class dlk.utils.config.BaseConfig(config: Dict)[source]
Bases:
object
BaseConfig provide the basic function for all config
- class dlk.utils.config.ConfigTool[source]
Bases:
object
This Class is not be used as much as I design.
- static do_update_config(config: dict, update_config: Optional[dict] = None) Dict [source]
use the update_config dict update the config dict, recursively
see ConfigTool._inplace_update_dict
- Parameters
config – will be updated dict
update_confg – config: use _new update _base
- Returns
updated_config
- static get_config_by_stage(stage: str, config: Dict) Dict [source]
get the stage_config for special stage in provide config
it means the config of this stage equals to config[stage] return config[config[stage]]
- Config Example:
>>> config = { >>> "train":{ //train、predict、online stage config, using '&' split all stages >>> "data_pair": { >>> "label": "label_id" >>> }, >>> "data_set": { // for different stage, this processor will process different part of data >>> "train": ['train', 'dev'], >>> "predict": ['predict'], >>> "online": ['online'] >>> }, >>> "vocab": "label_vocab", // usually provided by the "token_gather" module >>> }, >>> "predict": "train", >>> "online": ["train", >>> {"vocab": "new_label_vocab"} >>> ] >>> } >>> config.get_config['predict'] == config['predict'] == config['train']
- Parameters
stage – the stage, like ‘train’, ‘predict’, etc.
config – the base config which has different stage config
- Returns
stage_config
- static get_leaf_module(module_register: dlk.utils.register.Register, module_config_register: dlk.utils.register.Register, module_name: str, config: Dict) Tuple[Any, object] [source]
get the module from module_register and module_config from module_config_register which name=module_name
- Parameters
module_register – register for module which has ‘module_name’
module_config_register – config register for config which has ‘module_name’
module_name – the module name which we want to get from register
- Returns
module(which name is module_name), module_config(which name is module_name)
dlk.utils.get_root module
Get the dlk package root path
dlk.utils.logger module
- class dlk.utils.logger.Logger(log_file: str = '', base_dir: str = 'logs', log_level: str = 'debug', log_name='dlk')[source]
Bases:
object
docstring for logger
- static get_logger() loguru._logger.Logger [source]
return the ‘dlk’ logger if initialized otherwise init and return it
- Returns
Logger.global_logger
- global_log_file: set[str] = {}
- global_logger: loguru._logger.Logger = <loguru.logger handlers=[(id=1, level=10, sink=<stdout>)]>
- static init_file_logger(log_file, base_dir='logs', log_level: str = 'debug')[source]
init(if there is not one) or change(if there already is one) the log file
- Parameters
log_file – log file path
base_dir – real log path is ‘$base_dir/$log_file’
log_level – ‘debug’, ‘info’, etc.
- Returns
None
- static init_global_logger(log_level: str = 'debug', log_name: Optional[str] = None, reinit: bool = False)[source]
init the global_logger
- Parameters
log_level – you can change this to logger to different level
log_name – change this is not suggested
reinit – if set true, will force reinit
- Returns
None
- level_map = {'debug': 'DEBUG', 'error': 'ERROR', 'info': 'INFO', 'warning': 'WARNING'}
- log_name: str = 'dlk'
- warning_file = False
dlk.utils.parser module
- class dlk.utils.parser.BaseConfigParser(config_file: Union[str, Dict, List], config_base_dir: str = '', register: Optional[dlk.utils.register.Register] = None)[source]
Bases:
object
The config parser order is: inherit -> search -> link
If some config is marked to “@”, this means the para has not default value, you must coverd it(like ‘label_nums’, etc.).
- static check_config(configs: Union[Dict, List[Dict]]) None [source]
check all config is valid.
check all “@” is replaced to correct value. :param configs: TODO
- Returns
None
- Raises
ValueError –
- classmethod collect_link(config, trace: Optional[List] = None, all_level_links: Optional[Dict] = None, level=0)[source]
collect move all links in config to top
only do in the top level of config, collect all level links and return the links with level
- Parameters
config –
>>> { >>> "arg1": { >>> "arg11": 2 >>> "arg12": 3 >>> "_link": {"arg11": "arg12"} >>> } >>> }
all_level_links – TODO
level – TODO
- Returns
>>> { >>> "arg1": { >>> "arg11": 2 >>> "arg12": 3 >>> } >>> "_link": {"arg1.arg11": "arg1.arg12"} >>> }
- static config_link_para(link: Optional[Dict[str, Union[str, List[str]]]] = None, config: Optional[Dict] = None)[source]
inplace link the config[to] = config[source]
- Parameters
link – {link-from:link-to-1, link-from:[link-to-2, link-to-3]}
config – will linked base config
- Returns
None
- classmethod flat_search(search, config: dict) List[dict] [source]
flat all the _search paras to list
support recursive parser _search now, this means you can add _search/_link/_base paras in _search paras but you should only search currently level paras
- Parameters
search – search paras, {“para1”: [1,2,3], ‘para2’: ‘list(range(10))’}
config – base config
Returns: list of possible config
- classmethod get_base_config(config_name: str) Dict [source]
get the base config use the config_name
- Parameters
config_name – the config name
- Returns
config of the config_name
- get_cartesian_prod(list_of_list_of_dict: List[List[Dict]]) List[List[Dict]] [source]
get catesian prod from two lists
- Parameters
list_of_list_of_dict – [[config_a1, config_a2], [config_b1, config_b2]]
- Returns
[[config_a1, config_b1], [config_a1, config_b2], [config_a2, config_b1], [config_a2, config_b2]]
- get_kind_module_base_config(abstract_config: Union[dict, str], kind_module: str = '') List[dict] [source]
get the whole config of ‘kind_module’ by given abstract_config
- Parameters
abstract_config – will expanded config
kind_module – the module kind, like ‘embedding’, ‘subprocessor’, which registed in config_parser_register
Returns: parserd config (whole config) of abstract_config
- static get_named_list_cartesian_prod(dict_of_list: Optional[Dict[str, List]] = None) List[Dict] [source]
get catesian prod from named lists
- Parameters
dict_of_list – {‘name1’: [1,2,3], ‘name2’: “list(range(1, 4))”}
- Returns
1, ‘name2’: 1}, {‘name1’: 1, ‘name2’: 2}, {‘name1’: 1, ‘name2’: 3}, …]
- Return type
[{‘name1’
- is_rep_config(list_of_dict: List[dict]) bool [source]
check is there a repeat config in list
- Parameters
list_of_dict – a list of dict
- Returns
has repeat or not
- load_hjson_file(file_path: str) Dict [source]
load hjson file from file_path and return a Dict
- Parameters
file_path – the file path
Returns: loaded dict
- map_to_submodule(config: dict, map_fun: Callable) Dict [source]
map the map_fun to all submodules in config
use the map_fun to process all the modules
- Parameters
config – a dict of submodules, the key is the module kind wich registed in config_parser_register
map_fun – use the map_fun process the submodule
Returns: TODO
- class dlk.utils.parser.CallbackConfigParser(config_file)[source]
Bases:
dlk.utils.parser.BaseConfigParser
docstring for CallbackConfigParser
- class dlk.utils.parser.DatamoduleConfigParser(config_file)[source]
Bases:
dlk.utils.parser.BaseConfigParser
docstring for DatamoduleConfigParser
- class dlk.utils.parser.DecoderConfigParser(config_file)[source]
Bases:
dlk.utils.parser.BaseConfigParser
docstring for DecoderConfigParser
- class dlk.utils.parser.EmbeddingConfigParser(config_file)[source]
Bases:
dlk.utils.parser.BaseConfigParser
docstring for EmbeddingConfigParser
- class dlk.utils.parser.EncoderConfigParser(config_file)[source]
Bases:
dlk.utils.parser.BaseConfigParser
docstring for EncoderConfigParser
- class dlk.utils.parser.IModelConfigParser(config_file)[source]
Bases:
dlk.utils.parser.BaseConfigParser
docstring for IModelConfigParser
- class dlk.utils.parser.InitMethodConfigParser(config_file)[source]
Bases:
dlk.utils.parser.BaseConfigParser
docstring for InitMethodConfigParser
- class dlk.utils.parser.LinkUnionTool[source]
Bases:
object
Assisting tool for parsering the “_link” of config. All the function named the top level has high priority than low level
This class is mostly for resolve the confilicts of the low and high level register links.
- find(key: str)[source]
find the root of the key
- Parameters
key – a token
- Returns
the root of the key
- low_level_union(link_from: str, link_to: str)[source]
union the low level link_from->link_to pair
On the basis of the high-level links, this function is used to regist low-level link If link-from and link-to were all not appeared at before, they will be directly registed. If only one of the link-from and link-to appeared, the value of the link-from and link-to will be overwritten by the corresponding value of the upper level, If both link-from and link-to appeared at before, and if they linked the same value, we will do nothing, otherwise RAISE AN ERROR
- Parameters
link_from – the link-from key
link_to – the link-to key
- Returns
None
- register_low_links(links: Dict)[source]
register the low level links, low level means the base(parant) level config
- Parameters
links – {“link-from”: [“list of link-to”], “link-from2”: “link-to2”}
- Returns
self
- register_top_links(links: Dict)[source]
register the top level links, top level means the link_to level config
- Parameters
links – {“from”: [“tolist”], “from2”: “to2”}
- Returns
self
- top_level_union(link_from: str, link_to: str)[source]
union the top level link_from->link_to pair
Register the ‘link’(link-from -> link-to) in the same(top) level config should be merged using top_level_union Parameters are not allowed to be assigned repeatedly (the same parameter cannot appear more than once in the link-to position, otherwise it will cause ambiguity.)
- Parameters
link_from – the link-from key
link_to – the link-to key
- Returns
None
- class dlk.utils.parser.LossConfigParser(config_file)[source]
Bases:
dlk.utils.parser.BaseConfigParser
docstring for LossConfigParser
- class dlk.utils.parser.ManagerConfigParser(config_file)[source]
Bases:
dlk.utils.parser.BaseConfigParser
docstring for ManagerConfigParser
- class dlk.utils.parser.ModelConfigParser(config_file)[source]
Bases:
dlk.utils.parser.BaseConfigParser
docstring for ModelConfigParser
- class dlk.utils.parser.ModuleConfigParser(config_file)[source]
Bases:
dlk.utils.parser.BaseConfigParser
docstring for ModuleConfigParser
- class dlk.utils.parser.OptimizerConfigParser(config_file)[source]
Bases:
dlk.utils.parser.BaseConfigParser
docstring for OptimizerConfigParser
- class dlk.utils.parser.PostProcessorConfigParser(config_file)[source]
Bases:
dlk.utils.parser.BaseConfigParser
docstring for PostProcessorConfigParser
- class dlk.utils.parser.ProcessorConfigParser(config_file)[source]
Bases:
dlk.utils.parser.BaseConfigParser
docstring for ProcessorConfigParser
- class dlk.utils.parser.RootConfigParser(config_file)[source]
Bases:
dlk.utils.parser.BaseConfigParser
docstring for RootConfigParser
- class dlk.utils.parser.ScheduleConfigParser(config_file)[source]
Bases:
dlk.utils.parser.BaseConfigParser
docstring for ScheduleConfigParser
- class dlk.utils.parser.SubProcessorConfigParser(config_file)[source]
Bases:
dlk.utils.parser.BaseConfigParser
docstring for SubProcessorConfigParser
- class dlk.utils.parser.TaskConfigParser(config_file)[source]
Bases:
dlk.utils.parser.BaseConfigParser
docstring for TaskConfigParser
dlk.utils.quick_search module
- class dlk.utils.quick_search.QuickSearch(words: Iterable = [])[source]
Bases:
object
Ahocorasick enhanced Trie
- add_words(words: Iterable)[source]
add words from iterator to the trie
- Parameters
words – Iterable[tokens]
- Returns
None
- has(key: str) bool [source]
check key is in trie
- Parameters
key – a token(str)
- Returns
bool(has or not)
- search(search_str: str) List[Dict] [source]
find whether some sub_str in trie
- Parameters
search_str – find the search_str
- Returns
>>> the result organized as { >>> "start": start_position, >>> "end": end_position, >>> "str": search_str[start_position: end_position] >>> }
- Return type
>>> list of result
dlk.utils.register module
- class dlk.utils.register.Register(register_name: str)[source]
Bases:
object
- get(name: str = '') Any [source]
get the module by name
- Parameters
name – the name should be the real name or name+@+sub_name, and the
- Returns
registed module
dlk.utils.tokenizer_util module
- class dlk.utils.tokenizer_util.PreTokenizerFactory(tokenizer: tokenizers.Tokenizer)[source]
Bases:
object
- property bert
bert pre_tokenizer
- Returns
BertPreTokenizer
- property bytelevel
byte level pre_tokenizer
- Returns
ByteLevel
- property whitespace
whitespace pre_tokenizer
- Returns
Whitespace
- property whitespacesplit
whitespacesplit pre_tokenizer
- Returns
WhitespaceSplit
- class dlk.utils.tokenizer_util.TokenizerNormalizerFactory(tokenizer: tokenizers.Tokenizer)[source]
Bases:
object
- property lowercase
do lowercase normalizers
- Returns
Lowercase
- property nfc
do nfc normalizers
- Returns
NFC
- property nfd
do nfd normalizers
- Returns
NFD
- property strip
do strip normalizers
- Returns
StripAccents
- property strip_accents
do strip normalizers
- Returns
StripAccents
dlk.utils.vocab module
- class dlk.utils.vocab.Vocabulary(do_strip: bool = False, unknown: str = '', ignore: str = '', pad: str = '')[source]
Bases:
object
generate vocab from tokens(token or Iterable tokens) you can dumps the object to dict and load from dict
- add_from_iter(iterator)[source]
add the tokens in iterator to vocab
- Parameters
iterator – List[str] | Set[str] | List[List[str]]
- Returns
self
- auto_get_index(data: Union[str, List])[source]
get the index of word ∈data from this vocab
- Parameters
data – auto detection
- Returns
type the same as data
- auto_update(data: Union[str, Iterable])[source]
auto detect data type to update the vocab
- Parameters
data – str| List[str] | Set[str] | List[List[str]]
- Returns
self
- filter_rare(min_freq=1, most_common=- 1)[source]
filter the words which count is to small.
min_freq and most_common can not set all
- Parameters
min_freq – minist frequency
most_common – most common number, -1 means all
- Returns
None