Output processor

GenericOutputProcessor

Generic output processor.

TsvOutputProcessor

TSV output processor.

XlsxOutputProcessor

Excel output processor.

Output metadata processor. The goal of this module is to take an input list of GenericEntity’s, transform it into a dataframe and save it with pandas functionality into different formats. Pretty simple!

Mandatory arguments:

  • output_path: Path to the file to save the metadata.

Optional arguments:

  • verbose: set to True if you want INFO and above-level logging events. If not set or set to False, only WARNING

    and above will be displayed

Subclasses of GenericOutputProcessor must define the following methods/properties:

  • _save

class GenericOutputProcessor(output_path, verbose=False)

Bases: object

Generic output processor. Defines the mandatory functions for the subclasses to function.

Parameters:

output_path (str) – path to save the file. Please include the name and extension of the file.

save(entities)

Transform the entities into a dataframe to use pandas functionality to save.

Parameters:

entities (list[GenericEntity]) – Subclasses of GenericEntity.

_save(dataframe)

Function to be overriden by subclasses. Takes a dataframe and saves the output into self.path.

Parameters:

dataframe (DataFrame) – Dataframe containing the flattened metadata from the GenericEntity subclasses.

class TsvOutputProcessor(output_path)

Bases: GenericOutputProcessor

TSV output processor. Takes a list of entities and outputs a TSV with the metadata processed.

Parameters:

output_path (str) – Path to the file being saved. Please include tsv extension.

_save(dataframe)

Save the resulting dataframe from save() into a tsv, using pandas functionality. NO, the delimiter is not customizable. Create another subclass if you want that. TSV means TAB-Separated Values, not comma, not pipes, not anything else. You weirdo.

Parameters:

dataframe (DataFrame) – Dataframe containing the flattened metadata from the GenericEntity subclasses.

class XlsxOutputProcessor(output_path, sheet_name='Sheet1')

Bases: GenericOutputProcessor

Excel output processor. Takes a list of entities and outputs an excel file with the metadata processed.

Parameters:

output_path – Path to the file being saved. Please include ‘.xlsx’ extension.

_save(dataframe)

Save the resulting dataframe from save() into an excel.

Parameters:

dataframe (DataFrame) – Dataframe containing the flattened metadata from the GenericEntity subclasses.