victa package¶

Submodules¶

victa.couplets module¶

class victa.couplets.Couplet[source]¶

Bases: victa.couplets.Couplet

namedtuple: lightweight class for couplets

to_series()[source]¶: Convert to a pandas.Series

victa.errors module¶

exception victa.errors.ClassificationError(record, id_field, steps)[source]¶

Bases: victa.errors.VictaError, RuntimeError

Custom Exception raised when classification of a record fails

exception victa.errors.ManadatoryFieldError[source]¶

Bases: victa.errors.VictaError, ValueError

Custom Exception raised when Key/Rule building fails because of empty fields

exception victa.errors.MultipleMatchesError(record, id_field, couplet, rulesets)[source]¶

Bases: victa.errors.VictaError, RuntimeError

Custom Exception raised when multiple rulesets match a record

exception victa.errors.RuleSyntaxError[source]¶

Bases: victa.errors.VictaError, SyntaxError

Custom Exception raised when rule parsing fails

exception victa.errors.VictaError[source]¶

Bases: Exception

Custom “catchall” Base Exception

victa.key module¶

Classification Key

Todo

Module level doc

victa.key.build_key(key_df, key_desc)[source]¶

Build a NetworkX DiGraph containing couplets (nodes) joined by rules (edges)

TODO: key couplet/class data model is nasty and a hangover from the old key_to_key and key_to_mvg model

Parameters:	key_df (pandas.DataFrame) – dataframe containing the key couplets and rules dataframe must have the following column structure: INPUT_COUPLET = unique integer identifying the parent couplet. RULES = string containing expression to test. Expression format must be valid python syntax and conform to the following grammar: [not] rule_id [[and\|or][not][rule_id]] `rule_id` is an integer identifying each rule to be tested. Examples: NNN not NNN NNN or NN NNN or NN or N not (NNN or NN) (NNN or NN) or (N and NNNN) NNN and not NN OUTPUT_COUPLET = couplet to output if rules expression is True (mutally exclusive with OUTPUT_CLASS) OUTPUT_CLASS = class to output if rules expression is True (mutally exclusive with OUTPUT_COUPLET) OUTPUT_NAME = Output couplet/class name COMMENTS [optional] = Additional comments key_desc – Text description of the Key. Used as the description of the root node
Returns:	nx.DiGraph
Return type:	key

class victa.key.Key(key_df, key_desc, rules_df)[source]¶

Bases: object

Classification Key

classify(record, id_field=None)[source]¶

Classify a record

Parameters:	record (pandas.Series) – record to be classified record needs to contain all columns (Series axis labels) referred to in the `Rule`. See victa.rules.build_rules id_field (str) – column name to use as unique ID field
Returns:	the output class and a the couplets that were traversed
Return type:	tuple(pandas.Series, pandas.Dataframe)
Raises:	`ClassificationError` – When unable to classify a record

Todo

figure out a better way to stop infinite recursion
decide return data model

classify_iter(records, id_field=None)[source]¶

Parameters:

records (pandas.DataFrame) – records to be classified records need to contain all columns (DataFrame axis labels) referred to in the :code:`Rule`s see victa.key.build_rules
id_field (str) – column name to use as unique ID field

Yields:

tuple(pandas.Series, pandas.Dataframe, pandas.Series) –

the output class, a list of couplets: that were traversed and the input record

Notes

Will yield tuple(None, None, pandas.Series) on ClassificationError, MultipleMatchesError

victa.rules module¶

TODO Docstring

victa.rules.build_rules(rules_df)[source]¶

Build a RuleSet of Rule objects from a Pandas DataFrame containing the rule definitions

Parameters:

rules_df (pandas.DataFrame) –

dataframe containing the rules The dataframe must have the following column structure:

ID = unique integer identifying the rule

ATTRIBUTE = attribute/column to use when rule is tested (i.e. in the record to be classified by the key)

OPERATOR = positive comparison operator:

in, =, >=, >, <=, <, regex where: regex is a valid [regular expression](https://docs.python.org/3/library/re.html)

VALUE = text string to look for in ATTRIBUTE.

NAME = Rule name

COMMENTS [optional] = Additional comments

Returns: victa.RuleSet

Return type: ruleset

Note

Order for ordinal comparisons is ATTRIBUTE operator VALUE, i.e ATTRIBUTE >= 5.0

class victa.rules.Rule(value, attribute, operator, name, comment='')[source]¶

Bases: object

Build a callable Rule object.

The instantiated Rule will return True or False when called with a record to test against.

Parameters:	value (str) – text string to look for attribute (str) – attribute/column to use when rule is tested operator (str) – positive comparison operator: `in`, `=`, `>=`, `>`, `<=`, `<`, `regex` where: regex is a valid regular expression string (https://docs.python.org/3/library/re.html) name (str) – Rule name comment (str, optional) – Additional comments
Returns:
Return type:	victa.Rule

class victa.rules.RuleSet[source]¶

Bases: dict

test(expr, record)[source]¶

Test a ruleset expression against a record

Parameters:	expr (str) – string expression to be evaluated record (pandas.Series) – record to test against expression
Returns:
Return type:	Bool

victa.utils module¶

victa.utils.shutil_which(cmd, mode=1, path=None)[source]¶: Given a command, mode, and a PATH string, return the path which conforms to the given mode on the PATH, or None if there is no such file. mode defaults to os.F_OK | os.X_OK. path defaults to the result of os.environ.get(“PATH”), or can be overridden with a custom search path.

Module contents¶

VICTA

Author:: Luke Pinner (ERIN)

Todo

Package level doc

victa.build_rules(rules_df)[source]¶

Build a RuleSet of Rule objects from a Pandas DataFrame containing the rule definitions

Parameters:

rules_df (pandas.DataFrame) –

dataframe containing the rules The dataframe must have the following column structure:

ID = unique integer identifying the rule

ATTRIBUTE = attribute/column to use when rule is tested (i.e. in the record to be classified by the key)

OPERATOR = positive comparison operator:

in, =, >=, >, <=, <, regex where: regex is a valid [regular expression](https://docs.python.org/3/library/re.html)

VALUE = text string to look for in ATTRIBUTE.

NAME = Rule name

COMMENTS [optional] = Additional comments

Returns: victa.RuleSet

Return type: ruleset

Note

Order for ordinal comparisons is ATTRIBUTE operator VALUE, i.e ATTRIBUTE >= 5.0

victa.build_key(key_df, key_desc)[source]¶

Build a NetworkX DiGraph containing couplets (nodes) joined by rules (edges)

TODO: key couplet/class data model is nasty and a hangover from the old key_to_key and key_to_mvg model

Parameters:	key_df (pandas.DataFrame) – dataframe containing the key couplets and rules dataframe must have the following column structure: INPUT_COUPLET = unique integer identifying the parent couplet. RULES = string containing expression to test. Expression format must be valid python syntax and conform to the following grammar: [not] rule_id [[and\|or][not][rule_id]] `rule_id` is an integer identifying each rule to be tested. Examples: NNN not NNN NNN or NN NNN or NN or N not (NNN or NN) (NNN or NN) or (N and NNNN) NNN and not NN OUTPUT_COUPLET = couplet to output if rules expression is True (mutally exclusive with OUTPUT_CLASS) OUTPUT_CLASS = class to output if rules expression is True (mutally exclusive with OUTPUT_COUPLET) OUTPUT_NAME = Output couplet/class name COMMENTS [optional] = Additional comments key_desc – Text description of the Key. Used as the description of the root node
Returns:	nx.DiGraph
Return type:	key

exception victa.ClassificationError(record, id_field, steps)[source]¶

Bases: victa.errors.VictaError, RuntimeError

Custom Exception raised when classification of a record fails

exception victa.MultipleMatchesError(record, id_field, couplet, rulesets)[source]¶

Bases: victa.errors.VictaError, RuntimeError

Custom Exception raised when multiple rulesets match a record

class victa.Couplet[source]¶

Bases: victa.couplets.Couplet

namedtuple: lightweight class for couplets

to_series()[source]¶: Convert to a pandas.Series

class victa.Key(key_df, key_desc, rules_df)[source]¶

Bases: object

Classification Key

classify(record, id_field=None)[source]¶

Classify a record

Parameters:	record (pandas.Series) – record to be classified record needs to contain all columns (Series axis labels) referred to in the `Rule`. See victa.rules.build_rules id_field (str) – column name to use as unique ID field
Returns:	the output class and a the couplets that were traversed
Return type:	tuple(pandas.Series, pandas.Dataframe)
Raises:	`ClassificationError` – When unable to classify a record

Todo

figure out a better way to stop infinite recursion
decide return data model

classify_iter(records, id_field=None)[source]¶

Parameters:

records (pandas.DataFrame) – records to be classified records need to contain all columns (DataFrame axis labels) referred to in the :code:`Rule`s see victa.key.build_rules
id_field (str) – column name to use as unique ID field

Yields:

tuple(pandas.Series, pandas.Dataframe, pandas.Series) –

the output class, a list of couplets: that were traversed and the input record

Notes

Will yield tuple(None, None, pandas.Series) on ClassificationError, MultipleMatchesError

class victa.Rule(value, attribute, operator, name, comment='')[source]¶

Bases: object

Build a callable Rule object.

The instantiated Rule will return True or False when called with a record to test against.

Parameters:	value (str) – text string to look for attribute (str) – attribute/column to use when rule is tested operator (str) – positive comparison operator: `in`, `=`, `>=`, `>`, `<=`, `<`, `regex` where: regex is a valid regular expression string (https://docs.python.org/3/library/re.html) name (str) – Rule name comment (str, optional) – Additional comments
Returns:
Return type:	victa.Rule

class victa.RuleSet[source]¶

Bases: dict

test(expr, record)[source]¶

Test a ruleset expression against a record

Parameters:	expr (str) – string expression to be evaluated record (pandas.Series) – record to test against expression
Returns:
Return type:	Bool

exception victa.RuleSyntaxError[source]¶

Bases: victa.errors.VictaError, SyntaxError

Custom Exception raised when rule parsing fails