victa package

Submodules

victa.couplets module

class victa.couplets.Couplet[source]

Bases: victa.couplets.Couplet

namedtuple: lightweight class for couplets

to_series()[source]

Convert to a pandas.Series

victa.errors module

exception victa.errors.ClassificationError(record, id_field, steps)[source]

Bases: victa.errors.VictaError, RuntimeError

Custom Exception raised when classification of a record fails

exception victa.errors.ManadatoryFieldError[source]

Bases: victa.errors.VictaError, ValueError

Custom Exception raised when Key/Rule building fails because of empty fields

exception victa.errors.MultipleMatchesError(record, id_field, couplet, rulesets)[source]

Bases: victa.errors.VictaError, RuntimeError

Custom Exception raised when multiple rulesets match a record

exception victa.errors.RuleSyntaxError[source]

Bases: victa.errors.VictaError, SyntaxError

Custom Exception raised when rule parsing fails

exception victa.errors.VictaError[source]

Bases: Exception

Custom “catchall” Base Exception

victa.key module

Classification Key

Todo

  • Module level doc
victa.key.build_key(key_df, key_desc)[source]

Build a NetworkX DiGraph containing couplets (nodes) joined by rules (edges)

TODO: key couplet/class data model is nasty and a hangover from the old key_to_key and key_to_mvg model

Parameters:
  • key_df (pandas.DataFrame) –

    dataframe containing the key couplets and rules dataframe must have the following column structure:

    • INPUT_COUPLET = unique integer identifying the parent couplet.
    • RULES = string containing expression to test.
      Expression format must be valid python syntax and conform to the following grammar:
      [not] rule_id [[and|or][not][rule_id]]
      

      rule_id is an integer identifying each rule to be tested.

      Examples:

      NNN
      not NNN
      NNN or NN
      NNN or NN or N
      not (NNN or NN)
      (NNN or NN) or (N and NNNN)
      NNN and not NN
      
    • OUTPUT_COUPLET = couplet to output if rules expression is True (mutally exclusive with OUTPUT_CLASS)
    • OUTPUT_CLASS = class to output if rules expression is True (mutally exclusive with OUTPUT_COUPLET)
    • OUTPUT_NAME = Output couplet/class name
    • COMMENTS [optional] = Additional comments
  • key_desc – Text description of the Key. Used as the description of the root node
Returns:

nx.DiGraph

Return type:

key

class victa.key.Key(key_df, key_desc, rules_df)[source]

Bases: object

Classification Key

classify(record, id_field=None)[source]

Classify a record

Parameters:
  • record (pandas.Series) – record to be classified record needs to contain all columns (Series axis labels) referred to in the Rule. See victa.rules.build_rules
  • id_field (str) – column name to use as unique ID field
Returns:

the output class and a the couplets that were traversed

Return type:

tuple(pandas.Series, pandas.Dataframe)

Raises:

ClassificationError – When unable to classify a record

Todo

  • figure out a better way to stop infinite recursion
  • decide return data model
classify_iter(records, id_field=None)[source]
Parameters:
  • records (pandas.DataFrame) – records to be classified records need to contain all columns (DataFrame axis labels) referred to in the :code:`Rule`s see victa.key.build_rules
  • id_field (str) – column name to use as unique ID field
Yields:

tuple(pandas.Series, pandas.Dataframe, pandas.Series)

the output class, a list of couplets

that were traversed and the input record

Notes

Will yield tuple(None, None, pandas.Series) on ClassificationError, MultipleMatchesError

victa.rules module

TODO Docstring

victa.rules.build_rules(rules_df)[source]

Build a RuleSet of Rule objects from a Pandas DataFrame containing the rule definitions

Parameters:rules_df (pandas.DataFrame) –

dataframe containing the rules The dataframe must have the following column structure:

  • ID = unique integer identifying the rule
  • ATTRIBUTE = attribute/column to use when rule is tested (i.e. in the record to be classified by the key)
  • OPERATOR = positive comparison operator:
    in, =, >=, >, <=, <, regex where: regex is a valid [regular expression](https://docs.python.org/3/library/re.html)
  • VALUE = text string to look for in ATTRIBUTE.
  • NAME = Rule name
  • COMMENTS [optional] = Additional comments
Returns:victa.RuleSet
Return type:ruleset

Note

  • Order for ordinal comparisons is ATTRIBUTE operator VALUE, i.e ATTRIBUTE >= 5.0
class victa.rules.Rule(value, attribute, operator, name, comment='')[source]

Bases: object

Build a callable Rule object.

The instantiated Rule will return True or False when called with a record to test against.

Parameters:
  • value (str) – text string to look for
  • attribute (str) – attribute/column to use when rule is tested
  • operator (str) – positive comparison operator: in, =, >=, >, <=, <, regex where: regex is a valid regular expression string (https://docs.python.org/3/library/re.html)
  • name (str) – Rule name
  • comment (str, optional) – Additional comments
Returns:

Return type:

victa.Rule

class victa.rules.RuleSet[source]

Bases: dict

test(expr, record)[source]

Test a ruleset expression against a record

Parameters:
  • expr (str) – string expression to be evaluated
  • record (pandas.Series) – record to test against expression
Returns:

Return type:

Bool

victa.utils module

victa.utils.shutil_which(cmd, mode=1, path=None)[source]

Given a command, mode, and a PATH string, return the path which conforms to the given mode on the PATH, or None if there is no such file. mode defaults to os.F_OK | os.X_OK. path defaults to the result of os.environ.get(“PATH”), or can be overridden with a custom search path.

Module contents

VICTA

Author:
Luke Pinner (ERIN)

Todo

  • Package level doc
victa.build_rules(rules_df)[source]

Build a RuleSet of Rule objects from a Pandas DataFrame containing the rule definitions

Parameters:rules_df (pandas.DataFrame) –

dataframe containing the rules The dataframe must have the following column structure:

  • ID = unique integer identifying the rule
  • ATTRIBUTE = attribute/column to use when rule is tested (i.e. in the record to be classified by the key)
  • OPERATOR = positive comparison operator:
    in, =, >=, >, <=, <, regex where: regex is a valid [regular expression](https://docs.python.org/3/library/re.html)
  • VALUE = text string to look for in ATTRIBUTE.
  • NAME = Rule name
  • COMMENTS [optional] = Additional comments
Returns:victa.RuleSet
Return type:ruleset

Note

  • Order for ordinal comparisons is ATTRIBUTE operator VALUE, i.e ATTRIBUTE >= 5.0
victa.build_key(key_df, key_desc)[source]

Build a NetworkX DiGraph containing couplets (nodes) joined by rules (edges)

TODO: key couplet/class data model is nasty and a hangover from the old key_to_key and key_to_mvg model

Parameters:
  • key_df (pandas.DataFrame) –

    dataframe containing the key couplets and rules dataframe must have the following column structure:

    • INPUT_COUPLET = unique integer identifying the parent couplet.
    • RULES = string containing expression to test.
      Expression format must be valid python syntax and conform to the following grammar:
      [not] rule_id [[and|or][not][rule_id]]
      

      rule_id is an integer identifying each rule to be tested.

      Examples:

      NNN
      not NNN
      NNN or NN
      NNN or NN or N
      not (NNN or NN)
      (NNN or NN) or (N and NNNN)
      NNN and not NN
      
    • OUTPUT_COUPLET = couplet to output if rules expression is True (mutally exclusive with OUTPUT_CLASS)
    • OUTPUT_CLASS = class to output if rules expression is True (mutally exclusive with OUTPUT_COUPLET)
    • OUTPUT_NAME = Output couplet/class name
    • COMMENTS [optional] = Additional comments
  • key_desc – Text description of the Key. Used as the description of the root node
Returns:

nx.DiGraph

Return type:

key

exception victa.ClassificationError(record, id_field, steps)[source]

Bases: victa.errors.VictaError, RuntimeError

Custom Exception raised when classification of a record fails

exception victa.MultipleMatchesError(record, id_field, couplet, rulesets)[source]

Bases: victa.errors.VictaError, RuntimeError

Custom Exception raised when multiple rulesets match a record

class victa.Couplet[source]

Bases: victa.couplets.Couplet

namedtuple: lightweight class for couplets

to_series()[source]

Convert to a pandas.Series

class victa.Key(key_df, key_desc, rules_df)[source]

Bases: object

Classification Key

classify(record, id_field=None)[source]

Classify a record

Parameters:
  • record (pandas.Series) – record to be classified record needs to contain all columns (Series axis labels) referred to in the Rule. See victa.rules.build_rules
  • id_field (str) – column name to use as unique ID field
Returns:

the output class and a the couplets that were traversed

Return type:

tuple(pandas.Series, pandas.Dataframe)

Raises:

ClassificationError – When unable to classify a record

Todo

  • figure out a better way to stop infinite recursion
  • decide return data model
classify_iter(records, id_field=None)[source]
Parameters:
  • records (pandas.DataFrame) – records to be classified records need to contain all columns (DataFrame axis labels) referred to in the :code:`Rule`s see victa.key.build_rules
  • id_field (str) – column name to use as unique ID field
Yields:

tuple(pandas.Series, pandas.Dataframe, pandas.Series)

the output class, a list of couplets

that were traversed and the input record

Notes

Will yield tuple(None, None, pandas.Series) on ClassificationError, MultipleMatchesError

class victa.Rule(value, attribute, operator, name, comment='')[source]

Bases: object

Build a callable Rule object.

The instantiated Rule will return True or False when called with a record to test against.

Parameters:
  • value (str) – text string to look for
  • attribute (str) – attribute/column to use when rule is tested
  • operator (str) – positive comparison operator: in, =, >=, >, <=, <, regex where: regex is a valid regular expression string (https://docs.python.org/3/library/re.html)
  • name (str) – Rule name
  • comment (str, optional) – Additional comments
Returns:

Return type:

victa.Rule

class victa.RuleSet[source]

Bases: dict

test(expr, record)[source]

Test a ruleset expression against a record

Parameters:
  • expr (str) – string expression to be evaluated
  • record (pandas.Series) – record to test against expression
Returns:

Return type:

Bool

exception victa.RuleSyntaxError[source]

Bases: victa.errors.VictaError, SyntaxError

Custom Exception raised when rule parsing fails