Output types

SQL Table Outputs

A table output type represents a database table. It has a list of columns and a list of validation queries that should be run against the table after task execution but before committing the results.

If the table columns do not match the columns provided, an exception will be raised. If a validation query returns any results, an exception will be raised. A raised exception will roll back the database transaction and the task will fail, thereby stopping execution of downstream tasks.

This example defines a table with one column and one validation query:

outputs:
- type: table
  name: my_table
  columns:
  - name: date
    datatype: date
    nullable: false
  validation_queries:
  - SELECT 'invalid date' FROM my_table WHERE date < '2020-01-01'

Django Table Outputs

A django_table output represents a Django model. It retrieves the list of columns from the Django model and as such just requires the model name and any validation queries:

outputs:
- type: table
  name: my_app.MyModel
  validation_queries:
  - SELECT 'invalid date' FROM my_table WHERE date < '2020-01-01'

Writing custom output types

To make additional output types, subclass the Output class and implement the following methods:

test_setup(self, context: dict, values: list = None) -> None

The test setup method receives a context dictionary which, when used with the workflows test, fixture, contains a key conn with a database connection as the value. It optionally receives a second parameter values which is a list of test data to populate. In the case of the table output type, these values are rows to insert. For other output types the data might have different meanings, such as files in a cloud bucket.

test_results(self, context: dict) -> list

The test results method retrieves data generated by the task to do assertions against. The table output type retrieves all rows in the table and returns a list of dicts representing the rows.

validate(self, context: dict)

The validate method is called just after task execution, in production and during tests. It can be used to do assertions against the data generated by the task and it can fail the task by raising an exception. In the case of database tasks, the exception will cause the database transaction to be rolled back, discarding any changes. The validate method has access to fields on the output, including custom fields like validation_queries and columns, which can be used to check that the output meets expectations.