Welcome to VICTA’s documentation!

Indices and tables

Vegetation Information Classification Tool Automator (VICTA) core library

This is a minimal implementation of a core library that can be built on to develop a new VICTA application.

Introduction

The VICTA core library consists of a classification key built from a directed graph (NetworkX DiGraph). The key is constructed from a set of couplets (graph nodes) and associated rules (graph edges). The classification key exposes a well defined API for passing data, key and rule logic in, requesting the classification of a data record, the decision path of that classification and the reporting of application exception and validation errors. There is no business logic hard-coded into the core library, it is a business/data agnostic decision tree only.

A simple diagram of the minimal implemented core library:

+------------------------------------------------+
|Command line script/GUI/Web App/Jupyter Notebook|
|                                                |
|    +-----------------+  +------------------+   |
|    |     INPUTS      |  |     OUTPUTS      |   |
|    | Data/Rules/Keys |  | Classification   |   |
|    |                 |  | Decision path    |   |
|    +--------------+--+  +--^---------------+   |
|                   |        |                   |
|            +------v--------+-------+           |
|            |        CORE API       |           |
|            |  Classification Tree  |           |
|            |  Rule and Key parsers |           |
|            +-----------------------+           |
+------------------------------------------------+

Data Model

The key (couplets) and rules are passed in as Pandas DataFrames, a tabular/iterable in-memory data structure. The exact format is similar to the existing “Key to Key” and “Key to MVG” format, but merged to a single table.

Key

The key dataframe must have the following column structure:

Attribute Name Description
INPUT_COUPL ET Unique integer identifying the parent couplet.
RULES String containing expression** to test.
OUTPUT_COUP LET Couplet to output if rules expression is True (mutally exclusive with OUTPUT_CLASS)
OUTPUT_CLAS S Class to output if rules expression is True (mutally exclusive with OUTPUT_COUPLET)
OUTPUT_NAME Output couplet/class name
COMMENTS Additional comments [optional]

** The RULES expression format must be valid python syntax and conform to the following grammar:

[not] rule_id [[and|or][not][rule_id]]

Where: rule_id is an integer identifying each rule to be tested.

Examples:

NNN
not NNN
NNN or NN
NNN or NN or N
not (NNN or NN)
(NNN or NN) or (N and NNNN)
NNN and not NN

Rules

The rules dataframe must have the following column structure:

Attribute Name Description
ID Unique integer identifying the rule.
ATTRIBUTE Attribute/column to use when rule is tested (i.e. in the record to be classified by the key)
OPERATOR Positive comparison operator: in, =, >=, >, <=, <, regex (where: regex is a valid regular expression )
VALUE Text string to look for in ATTRIBUTE
NAME Rule name
COMMENTS Additional comments [optional]

Code Example

 1  import os
 2  import pandas as pd
 3  from victa import Key, ClassificationError, MultipleMatchesError
 4 
 5  if __name__ == '__main__':
 6 
 7      id_field = 'NVIS_ID'
 8 
 9      output_results = '../data/mvgs_nvis_results.xlsx'
10      output_steps = '../data/mvgs_nvis_steps.xlsx'
11 
12      for output in (output_results, output_steps):
13          if os.path.exists(output):
14              os.unlink(output)
15 
16 
17      # Read couplets & rules
18      # Here we read from a spreadsheet, but you could get these from anywhere,
19      # a database, url, etc...
20      # All we need is pandas.DataFrame objects conforming to the structures
21      # documented in victa.rules.build_rules and victa.key.build_key
22      ruledf = pd.read_excel(open('../data/rules_nvis.xlsx', 'rb'))
23      keydf = pd.read_excel(open('../data/keys_nvis.xlsx', 'rb'))
24 
25      # Build key
26      key = Key(keydf, 'MVG Key', ruledf)
27 
28      # Read in tha records
29      # Here we read from a spreadsheet, but you could get the data from anywhere,
30      # a database, url, etc... All we need is a pandas.DataFrame object
31      recsdf = pd.read_excel(open('../data/FLATNVIS_VEG_DESC5.xlsx', 'rb'))
32 
33      # iterate yerself
34      all_results = []
35      all_steps = []
36      for idx, record in recsdf.iterrows():
37          try:
38              # Perform the classification
39              result, steps = key.classify(record, id_field=id_field)
40              all_results += [result]
41              all_steps += [steps]
42          except ClassificationError as e:
43              print(e)
44              # Can also do something with e.record and e.steps
45          except MultipleMatchesError as e:
46              print(e)
47              # Can also do something with e.record, e.couplet and e.rulesets
48 
49      # Write out the results
50      all_results = pd.DataFrame(all_results)
51      all_steps = pd.concat(all_steps, ignore_index=True)
52      all_results.to_excel(output_results, index=False)
53      all_steps.to_excel(output_steps, index=False)

Installation

conda-env create -f victa.yml
activate victa

Tests

Some basic tests of rules started. Needs more test coverage.

Contributors

Luke Pinner

License

Apache 2.0