victa package¶
Submodules¶
victa.couplets module¶
-
class
victa.couplets.
Couplet
[source]¶ Bases:
victa.couplets.Couplet
namedtuple: lightweight class for couplets
victa.errors module¶
-
exception
victa.errors.
ClassificationError
(record, id_field, steps)[source]¶ Bases:
victa.errors.VictaError
,RuntimeError
Custom Exception raised when classification of a record fails
-
exception
victa.errors.
ManadatoryFieldError
[source]¶ Bases:
victa.errors.VictaError
,ValueError
Custom Exception raised when Key/Rule building fails because of empty fields
-
exception
victa.errors.
MultipleMatchesError
(record, id_field, couplet, rulesets)[source]¶ Bases:
victa.errors.VictaError
,RuntimeError
Custom Exception raised when multiple rulesets match a record
-
exception
victa.errors.
RuleSyntaxError
[source]¶ Bases:
victa.errors.VictaError
,SyntaxError
Custom Exception raised when rule parsing fails
victa.key module¶
Classification Key
Todo
- Module level doc
-
victa.key.
build_key
(key_df, key_desc)[source]¶ Build a NetworkX DiGraph containing couplets (nodes) joined by rules (edges)
TODO: key couplet/class data model is nasty and a hangover from the old key_to_key and key_to_mvg model
Parameters: - key_df (pandas.DataFrame) –
dataframe containing the key couplets and rules dataframe must have the following column structure:
- INPUT_COUPLET = unique integer identifying the parent couplet.
- RULES = string containing expression to test.Expression format must be valid python syntax and conform to the following grammar:
[not] rule_id [[and|or][not][rule_id]]
rule_id
is an integer identifying each rule to be tested.Examples:
NNN not NNN NNN or NN NNN or NN or N not (NNN or NN) (NNN or NN) or (N and NNNN) NNN and not NN
- OUTPUT_COUPLET = couplet to output if rules expression is True (mutally exclusive with OUTPUT_CLASS)
- OUTPUT_CLASS = class to output if rules expression is True (mutally exclusive with OUTPUT_COUPLET)
- OUTPUT_NAME = Output couplet/class name
- COMMENTS [optional] = Additional comments
- key_desc – Text description of the Key. Used as the description of the root node
Returns: nx.DiGraph
Return type: key
- key_df (pandas.DataFrame) –
-
class
victa.key.
Key
(key_df, key_desc, rules_df)[source]¶ Bases:
object
Classification Key
-
classify
(record, id_field=None)[source]¶ Classify a record
Parameters: - record (pandas.Series) – record to be classified
record needs to contain all columns (Series axis labels) referred to in the
Rule
. See victa.rules.build_rules - id_field (str) – column name to use as unique ID field
Returns: the output class and a the couplets that were traversed
Return type: tuple(pandas.Series, pandas.Dataframe)
Raises: ClassificationError
– When unable to classify a recordTodo
- figure out a better way to stop infinite recursion
- decide return data model
- record (pandas.Series) – record to be classified
record needs to contain all columns (Series axis labels) referred to in the
-
classify_iter
(records, id_field=None)[source]¶ Parameters: - records (pandas.DataFrame) – records to be classified records need to contain all columns (DataFrame axis labels) referred to in the :code:`Rule`s see victa.key.build_rules
- id_field (str) – column name to use as unique ID field
Yields: tuple(pandas.Series, pandas.Dataframe, pandas.Series) –
- the output class, a list of couplets
that were traversed and the input record
Notes
Will yield tuple(None, None, pandas.Series) on ClassificationError, MultipleMatchesError
-
victa.rules module¶
TODO Docstring
-
victa.rules.
build_rules
(rules_df)[source]¶ Build a RuleSet of Rule objects from a Pandas DataFrame containing the rule definitions
Parameters: rules_df (pandas.DataFrame) – dataframe containing the rules The dataframe must have the following column structure:
- ID = unique integer identifying the rule
- ATTRIBUTE = attribute/column to use when rule is tested (i.e. in the record to be classified by the key)
- OPERATOR = positive comparison operator:
in
,=
,>=
,>
,<=
,<
,regex
where: regex is a valid [regular expression](https://docs.python.org/3/library/re.html)
- VALUE = text string to look for in ATTRIBUTE.
- NAME = Rule name
- COMMENTS [optional] = Additional comments
Returns: victa.RuleSet Return type: ruleset Note
- Order for ordinal comparisons is ATTRIBUTE operator VALUE, i.e ATTRIBUTE >= 5.0
-
class
victa.rules.
Rule
(value, attribute, operator, name, comment='')[source]¶ Bases:
object
Build a callable Rule object.
The instantiated Rule will return True or False when called with a record to test against.
Parameters: - value (str) – text string to look for
- attribute (str) – attribute/column to use when rule is tested
- operator (str) – positive comparison operator:
in
,=
,>=
,>
,<=
,<
,regex
where: regex is a valid regular expression string (https://docs.python.org/3/library/re.html) - name (str) – Rule name
- comment (str, optional) – Additional comments
Returns: Return type:
victa.utils module¶
-
victa.utils.
shutil_which
(cmd, mode=1, path=None)[source]¶ Given a command, mode, and a PATH string, return the path which conforms to the given mode on the PATH, or None if there is no such file. mode defaults to os.F_OK | os.X_OK. path defaults to the result of os.environ.get(“PATH”), or can be overridden with a custom search path.
Module contents¶
VICTA
- Author:
- Luke Pinner (ERIN)
Todo
- Package level doc
-
victa.
build_rules
(rules_df)[source]¶ Build a RuleSet of Rule objects from a Pandas DataFrame containing the rule definitions
Parameters: rules_df (pandas.DataFrame) – dataframe containing the rules The dataframe must have the following column structure:
- ID = unique integer identifying the rule
- ATTRIBUTE = attribute/column to use when rule is tested (i.e. in the record to be classified by the key)
- OPERATOR = positive comparison operator:
in
,=
,>=
,>
,<=
,<
,regex
where: regex is a valid [regular expression](https://docs.python.org/3/library/re.html)
- VALUE = text string to look for in ATTRIBUTE.
- NAME = Rule name
- COMMENTS [optional] = Additional comments
Returns: victa.RuleSet Return type: ruleset Note
- Order for ordinal comparisons is ATTRIBUTE operator VALUE, i.e ATTRIBUTE >= 5.0
-
victa.
build_key
(key_df, key_desc)[source]¶ Build a NetworkX DiGraph containing couplets (nodes) joined by rules (edges)
TODO: key couplet/class data model is nasty and a hangover from the old key_to_key and key_to_mvg model
Parameters: - key_df (pandas.DataFrame) –
dataframe containing the key couplets and rules dataframe must have the following column structure:
- INPUT_COUPLET = unique integer identifying the parent couplet.
- RULES = string containing expression to test.Expression format must be valid python syntax and conform to the following grammar:
[not] rule_id [[and|or][not][rule_id]]
rule_id
is an integer identifying each rule to be tested.Examples:
NNN not NNN NNN or NN NNN or NN or N not (NNN or NN) (NNN or NN) or (N and NNNN) NNN and not NN
- OUTPUT_COUPLET = couplet to output if rules expression is True (mutally exclusive with OUTPUT_CLASS)
- OUTPUT_CLASS = class to output if rules expression is True (mutally exclusive with OUTPUT_COUPLET)
- OUTPUT_NAME = Output couplet/class name
- COMMENTS [optional] = Additional comments
- key_desc – Text description of the Key. Used as the description of the root node
Returns: nx.DiGraph
Return type: key
- key_df (pandas.DataFrame) –
-
exception
victa.
ClassificationError
(record, id_field, steps)[source]¶ Bases:
victa.errors.VictaError
,RuntimeError
Custom Exception raised when classification of a record fails
-
exception
victa.
MultipleMatchesError
(record, id_field, couplet, rulesets)[source]¶ Bases:
victa.errors.VictaError
,RuntimeError
Custom Exception raised when multiple rulesets match a record
-
class
victa.
Couplet
[source]¶ Bases:
victa.couplets.Couplet
namedtuple: lightweight class for couplets
-
class
victa.
Key
(key_df, key_desc, rules_df)[source]¶ Bases:
object
Classification Key
-
classify
(record, id_field=None)[source]¶ Classify a record
Parameters: - record (pandas.Series) – record to be classified
record needs to contain all columns (Series axis labels) referred to in the
Rule
. See victa.rules.build_rules - id_field (str) – column name to use as unique ID field
Returns: the output class and a the couplets that were traversed
Return type: tuple(pandas.Series, pandas.Dataframe)
Raises: ClassificationError
– When unable to classify a recordTodo
- figure out a better way to stop infinite recursion
- decide return data model
- record (pandas.Series) – record to be classified
record needs to contain all columns (Series axis labels) referred to in the
-
classify_iter
(records, id_field=None)[source]¶ Parameters: - records (pandas.DataFrame) – records to be classified records need to contain all columns (DataFrame axis labels) referred to in the :code:`Rule`s see victa.key.build_rules
- id_field (str) – column name to use as unique ID field
Yields: tuple(pandas.Series, pandas.Dataframe, pandas.Series) –
- the output class, a list of couplets
that were traversed and the input record
Notes
Will yield tuple(None, None, pandas.Series) on ClassificationError, MultipleMatchesError
-
-
class
victa.
Rule
(value, attribute, operator, name, comment='')[source]¶ Bases:
object
Build a callable Rule object.
The instantiated Rule will return True or False when called with a record to test against.
Parameters: - value (str) – text string to look for
- attribute (str) – attribute/column to use when rule is tested
- operator (str) – positive comparison operator:
in
,=
,>=
,>
,<=
,<
,regex
where: regex is a valid regular expression string (https://docs.python.org/3/library/re.html) - name (str) – Rule name
- comment (str, optional) – Additional comments
Returns: Return type:
-
exception
victa.
RuleSyntaxError
[source]¶ Bases:
victa.errors.VictaError
,SyntaxError
Custom Exception raised when rule parsing fails