Pre-processing

Raw data should first be processed using notebooks in notebooks/preprocess/*. Entry point for the pre-processing script for the ML pipeline is src/pre-processing.py.

Args description:

*--data_path: Path to the data files

  • Input: Enter the root directory of the xarray data files as the script argument. All data files produced are stored in this directory.

    • src/utils/data_paths.py - defines the files paths for the features used in training and also the paths of the fuel_load.nc which will be created.

  • Output:

    • Creates fuel_load.nc file for Fuel Load Data (Burned Area * Above Ground Biomass).

    • Saves the following files for the Tropics & Mid-Latitudes regions respectively, where {type} is ‘tropics’ or ‘midlats’.

    • Save Directory root_path/{type}

      -  {type}_train.csv
      -  {type}_val.csv
      -  {type}_test.csv
      
    • Save Directory root_path/infer_{type}

      -  {type}*infers*\ July.csv
      -  {type}*infers*\ Aug.csv
      -  {type}*infers*\ Sept.csv
      -  {type}*infers*\ Oct.csv
      -  {type}*infers*\ Nov.csv
      -  {type}*infers*\ Dec.csv
      

    Where root_path is the root save path provided for pre-processing.py