Logo ROOT   6.12/06
Reference Guide
TMVAClassificationCategory.C File Reference

Detailed Description

This macro provides examples for the training and testing of the TMVA classifiers in categorisation mode.

As input data is used a toy-MC sample consisting of four Gaussian-distributed and linearly correlated input variables with category (eta) dependent properties.

For this example, only Fisher and Likelihood are used. Run via:

root -l TMVAClassificationCategory.C

The output file "TMVA.root" can be analysed with the use of dedicated macros (simply say: root -l <macro.C>), which can be conveniently invoked through a GUI that will appear at the end of the run of this macro.

Processing /builddir/build/BUILD/root-6.12.06/tutorials/tmva/TMVAClassificationCategory.C...
==> Start TMVAClassificationCategory
--- TMVAClassificationCategory: Accessing /builddir/build/BUILD/root-6.12.06/tutorials/tmva/data/toy_sigbkg_categ_offset.root
<HEADER> DataSetInfo : [dataset] : Added class "Signal"
: Add Tree TreeS of type Signal with 10000 events
<HEADER> DataSetInfo : [dataset] : Added class "Background"
: Add Tree TreeB of type Background with 10000 events
<HEADER> Factory : Booking method: Fisher
:
<HEADER> Factory : Booking method: Likelihood
:
<HEADER> Factory : Booking method: FisherCat
:
: Adding sub-classifier: Fisher::Category_Fisher_1
<HEADER> DataSetInfo : [Category_Fisher_1_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Fisher_1_dsi] : Added class "Background"
: Adding sub-classifier: Fisher::Category_Fisher_2
<HEADER> DataSetInfo : [Category_Fisher_2_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Fisher_2_dsi] : Added class "Background"
<HEADER> Factory : Booking method: LikelihoodCat
:
: Adding sub-classifier: Likelihood::Category_Likelihood_1
<HEADER> DataSetInfo : [Category_Likelihood_1_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Likelihood_1_dsi] : Added class "Background"
: Adding sub-classifier: Likelihood::Category_Likelihood_2
<HEADER> DataSetInfo : [Category_Likelihood_2_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Likelihood_2_dsi] : Added class "Background"
<HEADER> Factory : Train all methods
<HEADER> DataSetFactory : [dataset] : Number of events in input trees
:
:
: Number of training and testing events
: ---------------------------------------------------------------------------
: Signal -- training events : 5000
: Signal -- testing events : 5000
: Signal -- training and testing events: 10000
: Background -- training events : 5000
: Background -- testing events : 5000
: Background -- training and testing events: 10000
:
<HEADER> DataSetInfo : Correlation matrix (Signal):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 +0.368 +0.378 +0.391
: var2: +0.368 +1.000 +0.388 +0.386
: var3: +0.378 +0.388 +1.000 +0.389
: var4: +0.391 +0.386 +0.389 +1.000
: ----------------------------------------
<HEADER> DataSetInfo : Correlation matrix (Background):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 +0.365 +0.376 +0.381
: var2: +0.365 +1.000 +0.382 +0.387
: var3: +0.376 +0.382 +1.000 +0.376
: var4: +0.381 +0.387 +0.376 +1.000
: ----------------------------------------
<HEADER> DataSetFactory : [dataset] :
:
<HEADER> Factory : Train method: Fisher for Classification
:
<HEADER> Fisher : Results for Fisher coefficients:
: -----------------------
: Variable: Coefficient:
: -----------------------
: var1: -0.053
: var2: -0.014
: var3: +0.096
: var4: +0.216
: (offset): -0.023
: -----------------------
: Elapsed time for training with 10000 events: 0.00316 sec
<HEADER> Fisher : [dataset] : Evaluation of Fisher on training sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.00108 sec
: Creating xml weight file: dataset/weights/TMVAClassificationCategory_Fisher.weights.xml
: Creating standalone class: dataset/weights/TMVAClassificationCategory_Fisher.class.C
<HEADER> Factory : Training finished
:
<HEADER> Factory : Train method: Likelihood for Classification
:
: Filling reference histograms
: Building PDF out of reference histograms
: Elapsed time for training with 10000 events: 0.0461 sec
<HEADER> Likelihood : [dataset] : Evaluation of Likelihood on training sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.0105 sec
: Creating xml weight file: dataset/weights/TMVAClassificationCategory_Likelihood.weights.xml
: Creating standalone class: dataset/weights/TMVAClassificationCategory_Likelihood.class.C
: TMVA.root:/dataset/Method_Likelihood/Likelihood
<HEADER> Factory : Training finished
:
<HEADER> Factory : Train method: FisherCat for Classification
:
: Train all sub-classifiers for Classification ...
<HEADER> DataSetFactory : [Category_Fisher_1_dsi] : Number of events in input trees
: Dataset[Category_Fisher_1_dsi] : Signal requirement: "abs(eta)<=1.3"
: Dataset[Category_Fisher_1_dsi] : Signal -- number of events passed: 5123 / sum of weights: 5123
: Dataset[Category_Fisher_1_dsi] : Signal -- efficiency : 0.5123
: Dataset[Category_Fisher_1_dsi] : Background requirement: "abs(eta)<=1.3"
: Dataset[Category_Fisher_1_dsi] : Background -- number of events passed: 5134 / sum of weights: 5134
: Dataset[Category_Fisher_1_dsi] : Background -- efficiency : 0.5134
: Dataset[Category_Fisher_1_dsi] : you have opted for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Dataset[Category_Fisher_1_dsi] : you have opted for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Number of training and testing events
: ---------------------------------------------------------------------------
: Signal -- training events : 2561
: Signal -- testing events : 2561
: Signal -- training and testing events: 5122
: Dataset[Category_Fisher_1_dsi] : Signal -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.5123
: Background -- training events : 2567
: Background -- testing events : 2567
: Background -- training and testing events: 5134
: Dataset[Category_Fisher_1_dsi] : Background -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.5134
:
<HEADER> DataSetInfo : Correlation matrix (Signal):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.017 +0.004 +0.001
: var2: -0.017 +1.000 -0.019 -0.003
: var3: +0.004 -0.019 +1.000 -0.012
: var4: +0.001 -0.003 -0.012 +1.000
: ----------------------------------------
<HEADER> DataSetInfo : Correlation matrix (Background):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.019 -0.022 +0.003
: var2: -0.019 +1.000 -0.018 +0.004
: var3: -0.022 -0.018 +1.000 +0.004
: var4: +0.003 +0.004 +0.004 +1.000
: ----------------------------------------
<HEADER> DataSetFactory : [Category_Fisher_1_dsi] :
:
: Train method: Category_Fisher_1 for Classification
<HEADER> Category_Fisher_1 : Results for Fisher coefficients:
: -----------------------
: Variable: Coefficient:
: -----------------------
: var1: +0.105
: var2: +0.152
: var3: +0.247
: var4: +0.375
: (offset): +0.648
: -----------------------
: Elapsed time for training with 5128 events: 0.00172 sec
<HEADER> Category_Fisher_1 : [Category_Fisher_1_dsi] : Evaluation of Category_Fisher_1 on training sample (5128 events)
: Elapsed time for evaluation of 5128 events: 0.000573 sec
: Training finished
<HEADER> DataSetFactory : [Category_Fisher_2_dsi] : Number of events in input trees
: Dataset[Category_Fisher_2_dsi] : Signal requirement: "abs(eta)>1.3"
: Dataset[Category_Fisher_2_dsi] : Signal -- number of events passed: 4877 / sum of weights: 4877
: Dataset[Category_Fisher_2_dsi] : Signal -- efficiency : 0.4877
: Dataset[Category_Fisher_2_dsi] : Background requirement: "abs(eta)>1.3"
: Dataset[Category_Fisher_2_dsi] : Background -- number of events passed: 4866 / sum of weights: 4866
: Dataset[Category_Fisher_2_dsi] : Background -- efficiency : 0.4866
: Dataset[Category_Fisher_2_dsi] : you have opted for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Dataset[Category_Fisher_2_dsi] : you have opted for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Number of training and testing events
: ---------------------------------------------------------------------------
: Signal -- training events : 2438
: Signal -- testing events : 2438
: Signal -- training and testing events: 4876
: Dataset[Category_Fisher_2_dsi] : Signal -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.4877
: Background -- training events : 2433
: Background -- testing events : 2433
: Background -- training and testing events: 4866
: Dataset[Category_Fisher_2_dsi] : Background -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.4866
:
<HEADER> DataSetInfo : Correlation matrix (Signal):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.005 +0.002 -0.039
: var2: -0.005 +1.000 +0.011 -0.004
: var3: +0.002 +0.011 +1.000 -0.021
: var4: -0.039 -0.004 -0.021 +1.000
: ----------------------------------------
<HEADER> DataSetInfo : Correlation matrix (Background):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.007 +0.009 +0.008
: var2: -0.007 +1.000 -0.020 +0.013
: var3: +0.009 -0.020 +1.000 +0.007
: var4: +0.008 +0.013 +0.007 +1.000
: ----------------------------------------
<HEADER> DataSetFactory : [Category_Fisher_2_dsi] :
:
: Train method: Category_Fisher_2 for Classification
<HEADER> Category_Fisher_2 : Results for Fisher coefficients:
: -----------------------
: Variable: Coefficient:
: -----------------------
: var1: +0.107
: var2: +0.148
: var3: +0.251
: var4: +0.372
: (offset): -0.751
: -----------------------
: Elapsed time for training with 4871 events: 0.00151 sec
<HEADER> Category_Fisher_2 : [Category_Fisher_2_dsi] : Evaluation of Category_Fisher_2 on training sample (4871 events)
: Elapsed time for evaluation of 4871 events: 0.000545 sec
: Training finished
: Begin ranking of input variables...
<HEADER> Category_Fisher_1 : Ranking result (top variable is best ranked)
: -------------------------------
: Rank : Variable : Discr. power
: -------------------------------
: 1 : var4 : 2.205e-01
: 2 : var3 : 1.054e-01
: 3 : var2 : 4.114e-02
: 4 : var1 : 1.987e-02
: -------------------------------
<HEADER> Category_Fisher_2 : Ranking result (top variable is best ranked)
: -------------------------------
: Rank : Variable : Discr. power
: -------------------------------
: 1 : var4 : 2.153e-01
: 2 : var3 : 1.105e-01
: 3 : var2 : 4.289e-02
: 4 : var1 : 1.986e-02
: -------------------------------
: Elapsed time for training with 10000 events: 0.0419 sec
<HEADER> FisherCat : [dataset] : Evaluation of FisherCat on training sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.00588 sec
: Creating xml weight file: dataset/weights/TMVAClassificationCategory_FisherCat.weights.xml
<HEADER> Factory : Training finished
:
<HEADER> Factory : Train method: LikelihoodCat for Classification
:
: Train all sub-classifiers for Classification ...
<HEADER> DataSetFactory : [Category_Likelihood_1_dsi] : Number of events in input trees
: Dataset[Category_Likelihood_1_dsi] : Signal requirement: "abs(eta)<=1.3"
: Dataset[Category_Likelihood_1_dsi] : Signal -- number of events passed: 5123 / sum of weights: 5123
: Dataset[Category_Likelihood_1_dsi] : Signal -- efficiency : 0.5123
: Dataset[Category_Likelihood_1_dsi] : Background requirement: "abs(eta)<=1.3"
: Dataset[Category_Likelihood_1_dsi] : Background -- number of events passed: 5134 / sum of weights: 5134
: Dataset[Category_Likelihood_1_dsi] : Background -- efficiency : 0.5134
: Dataset[Category_Likelihood_1_dsi] : you have opted for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Dataset[Category_Likelihood_1_dsi] : you have opted for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Number of training and testing events
: ---------------------------------------------------------------------------
: Signal -- training events : 2561
: Signal -- testing events : 2561
: Signal -- training and testing events: 5122
: Dataset[Category_Likelihood_1_dsi] : Signal -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.5123
: Background -- training events : 2567
: Background -- testing events : 2567
: Background -- training and testing events: 5134
: Dataset[Category_Likelihood_1_dsi] : Background -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.5134
:
<HEADER> DataSetInfo : Correlation matrix (Signal):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.017 +0.004 +0.001
: var2: -0.017 +1.000 -0.019 -0.003
: var3: +0.004 -0.019 +1.000 -0.012
: var4: +0.001 -0.003 -0.012 +1.000
: ----------------------------------------
<HEADER> DataSetInfo : Correlation matrix (Background):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.019 -0.022 +0.003
: var2: -0.019 +1.000 -0.018 +0.004
: var3: -0.022 -0.018 +1.000 +0.004
: var4: +0.003 +0.004 +0.004 +1.000
: ----------------------------------------
<HEADER> DataSetFactory : [Category_Likelihood_1_dsi] :
:
: Train method: Category_Likelihood_1 for Classification
: Filling reference histograms
: Building PDF out of reference histograms
: Elapsed time for training with 5128 events: 0.0275 sec
<HEADER> Category_Likelihood_1 : [Category_Likelihood_1_dsi] : Evaluation of Category_Likelihood_1 on training sample (5128 events)
: Elapsed time for evaluation of 5128 events: 0.00554 sec
: TMVA.root:/dataset/Method_LikelihoodCat/LikelihoodCat/Method_Likelihood/Category_Likelihood_1
: Training finished
<HEADER> DataSetFactory : [Category_Likelihood_2_dsi] : Number of events in input trees
: Dataset[Category_Likelihood_2_dsi] : Signal requirement: "abs(eta)>1.3"
: Dataset[Category_Likelihood_2_dsi] : Signal -- number of events passed: 4877 / sum of weights: 4877
: Dataset[Category_Likelihood_2_dsi] : Signal -- efficiency : 0.4877
: Dataset[Category_Likelihood_2_dsi] : Background requirement: "abs(eta)>1.3"
: Dataset[Category_Likelihood_2_dsi] : Background -- number of events passed: 4866 / sum of weights: 4866
: Dataset[Category_Likelihood_2_dsi] : Background -- efficiency : 0.4866
: Dataset[Category_Likelihood_2_dsi] : you have opted for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Dataset[Category_Likelihood_2_dsi] : you have opted for interpreting the requested number of training/testing events
: to be the number of events AFTER your preselection cuts
:
: Number of training and testing events
: ---------------------------------------------------------------------------
: Signal -- training events : 2438
: Signal -- testing events : 2438
: Signal -- training and testing events: 4876
: Dataset[Category_Likelihood_2_dsi] : Signal -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.4877
: Background -- training events : 2433
: Background -- testing events : 2433
: Background -- training and testing events: 4866
: Dataset[Category_Likelihood_2_dsi] : Background -- due to the preselection a scaling factor has been applied to the numbers of requested events: 0.4866
:
<HEADER> DataSetInfo : Correlation matrix (Signal):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.005 +0.002 -0.039
: var2: -0.005 +1.000 +0.011 -0.004
: var3: +0.002 +0.011 +1.000 -0.021
: var4: -0.039 -0.004 -0.021 +1.000
: ----------------------------------------
<HEADER> DataSetInfo : Correlation matrix (Background):
: ----------------------------------------
: var1 var2 var3 var4
: var1: +1.000 -0.007 +0.009 +0.008
: var2: -0.007 +1.000 -0.020 +0.013
: var3: +0.009 -0.020 +1.000 +0.007
: var4: +0.008 +0.013 +0.007 +1.000
: ----------------------------------------
<HEADER> DataSetFactory : [Category_Likelihood_2_dsi] :
:
: Train method: Category_Likelihood_2 for Classification
: Filling reference histograms
: Building PDF out of reference histograms
: Elapsed time for training with 4871 events: 0.0246 sec
<HEADER> Category_Likelihood_2 : [Category_Likelihood_2_dsi] : Evaluation of Category_Likelihood_2 on training sample (4871 events)
: Elapsed time for evaluation of 4871 events: 0.00516 sec
: TMVA.root:/dataset/Method_LikelihoodCat/LikelihoodCat/Method_Likelihood/Category_Likelihood_2
: Training finished
: Begin ranking of input variables...
<HEADER> Category_Likelihood_1 : Ranking result (top variable is best ranked)
: -----------------------------------
: Rank : Variable : Delta Separation
: -----------------------------------
: 1 : var4 : 1.031e-01
: 2 : var3 : 1.716e-02
: 3 : var1 : 1.036e-02
: 4 : var2 : 4.428e-03
: -----------------------------------
<HEADER> Category_Likelihood_2 : Ranking result (top variable is best ranked)
: -----------------------------------
: Rank : Variable : Delta Separation
: -----------------------------------
: 1 : var4 : 1.424e-01
: 2 : var3 : 6.035e-02
: 3 : var2 : 1.824e-02
: 4 : var1 : 8.110e-03
: -----------------------------------
: Elapsed time for training with 10000 events: 0.269 sec
<HEADER> LikelihoodCat : [dataset] : Evaluation of LikelihoodCat on training sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.0154 sec
: Creating xml weight file: dataset/weights/TMVAClassificationCategory_LikelihoodCat.weights.xml
<HEADER> Factory : Training finished
:
: Ranking input variables (method specific)...
<HEADER> Fisher : Ranking result (top variable is best ranked)
: -------------------------------
: Rank : Variable : Discr. power
: -------------------------------
: 1 : var4 : 1.446e-01
: 2 : var3 : 7.153e-02
: 3 : var2 : 2.447e-02
: 4 : var1 : 1.243e-02
: -------------------------------
<HEADER> Likelihood : Ranking result (top variable is best ranked)
: -----------------------------------
: Rank : Variable : Delta Separation
: -----------------------------------
: 1 : var4 : 1.162e-01
: 2 : var3 : 5.179e-02
: 3 : var2 : 2.915e-02
: 4 : var1 : 2.168e-02
: -----------------------------------
: No variable ranking supplied by classifier: FisherCat
: No variable ranking supplied by classifier: LikelihoodCat
<HEADER> Factory : === Destroy and recreate all methods via weight files for testing ===
:
: Recreating sub-classifiers from XML-file
<HEADER> DataSetInfo : [Category_Fisher_1_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Fisher_1_dsi] : Added class "Background"
<HEADER> DataSetInfo : [Category_Fisher_2_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Fisher_2_dsi] : Added class "Background"
: Recreating sub-classifiers from XML-file
<HEADER> DataSetInfo : [Category_Likelihood_1_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Likelihood_1_dsi] : Added class "Background"
<HEADER> DataSetInfo : [Category_Likelihood_2_dsi] : Added class "Signal"
<HEADER> DataSetInfo : [Category_Likelihood_2_dsi] : Added class "Background"
<HEADER> Factory : Test all methods
<HEADER> Factory : Test method: Fisher for Classification performance
:
<HEADER> Fisher : [dataset] : Evaluation of Fisher on testing sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.00608 sec
<HEADER> Factory : Test method: Likelihood for Classification performance
:
<HEADER> Likelihood : [dataset] : Evaluation of Likelihood on testing sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.0123 sec
<HEADER> Factory : Test method: FisherCat for Classification performance
:
<HEADER> FisherCat : [dataset] : Evaluation of FisherCat on testing sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.00742 sec
<HEADER> Factory : Test method: LikelihoodCat for Classification performance
:
<HEADER> LikelihoodCat : [dataset] : Evaluation of LikelihoodCat on testing sample (10000 events)
: Elapsed time for evaluation of 10000 events: 0.0146 sec
<HEADER> Factory : Evaluate all methods
<HEADER> Factory : Evaluate classifier: Fisher
:
<HEADER> Fisher : [dataset] : Loop over test events and fill histograms with classifier response...
:
<HEADER> TFHandler_Fisher : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: -0.014081 1.2910 [ -5.3119 4.5609 ]
: var2: -0.014399 1.3299 [ -4.7537 4.6723 ]
: var3: -0.027971 1.3779 [ -5.2892 4.7007 ]
: var4: 0.12966 1.4883 [ -5.1002 4.9767 ]
: -----------------------------------------------------------
<HEADER> Factory : Evaluate classifier: Likelihood
:
<HEADER> Likelihood : [dataset] : Loop over test events and fill histograms with classifier response...
:
<HEADER> TFHandler_Likelihood : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: -0.014081 1.2910 [ -5.3119 4.5609 ]
: var2: -0.014399 1.3299 [ -4.7537 4.6723 ]
: var3: -0.027971 1.3779 [ -5.2892 4.7007 ]
: var4: 0.12966 1.4883 [ -5.1002 4.9767 ]
: -----------------------------------------------------------
<HEADER> Factory : Evaluate classifier: FisherCat
:
<HEADER> FisherCat : [dataset] : Loop over test events and fill histograms with classifier response...
:
<HEADER> TFHandler_FisherCat : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: -0.014081 1.2910 [ -5.3119 4.5609 ]
: var2: -0.014399 1.3299 [ -4.7537 4.6723 ]
: var3: -0.027971 1.3779 [ -5.2892 4.7007 ]
: var4: 0.12966 1.4883 [ -5.1002 4.9767 ]
: -----------------------------------------------------------
<HEADER> Factory : Evaluate classifier: LikelihoodCat
:
<HEADER> LikelihoodCat : [dataset] : Loop over test events and fill histograms with classifier response...
:
<HEADER> TFHandler_LikelihoodCat : Variable Mean RMS [ Min Max ]
: -----------------------------------------------------------
: var1: -0.014081 1.2910 [ -5.3119 4.5609 ]
: var2: -0.014399 1.3299 [ -4.7537 4.6723 ]
: var3: -0.027971 1.3779 [ -5.2892 4.7007 ]
: var4: 0.12966 1.4883 [ -5.1002 4.9767 ]
: -----------------------------------------------------------
:
: Evaluation results ranked by best signal efficiency and purity (area)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA
: Name: Method: ROC-integ
: dataset FisherCat : 0.914
: dataset LikelihoodCat : 0.913
: dataset Fisher : 0.808
: dataset Likelihood : 0.768
: -------------------------------------------------------------------------------------------------------------------
:
: Testing efficiency compared to training efficiency (overtraining check)
: -------------------------------------------------------------------------------------------------------------------
: DataSet MVA Signal efficiency: from test sample (from training sample)
: Name: Method: @B=0.01 @B=0.10 @B=0.30
: -------------------------------------------------------------------------------------------------------------------
: dataset FisherCat : 0.352 (0.360) 0.743 (0.739) 0.919 (0.916)
: dataset LikelihoodCat : 0.350 (0.351) 0.738 (0.736) 0.919 (0.916)
: dataset Fisher : 0.184 (0.185) 0.471 (0.486) 0.746 (0.742)
: dataset Likelihood : 0.211 (0.242) 0.446 (0.453) 0.609 (0.608)
: -------------------------------------------------------------------------------------------------------------------
:
<HEADER> Dataset:dataset : Created tree 'TestTree' with 10000 events
:
<HEADER> Dataset:dataset : Created tree 'TrainTree' with 10000 events
:
<HEADER> Factory : Thank you for using TMVA!
: For citation information, please visit: http://tmva.sf.net/citeTMVA.html
==> Wrote root file: TMVA.root
==> TMVAClassificationCategory is done!
#include <cstdlib>
#include <iostream>
#include <map>
#include <string>
#include "TChain.h"
#include "TFile.h"
#include "TTree.h"
#include "TString.h"
#include "TObjString.h"
#include "TSystem.h"
#include "TROOT.h"
#include "TMVA/Factory.h"
#include "TMVA/Tools.h"
#include "TMVA/TMVAGui.h"
// two types of category methods are implemented
Bool_t UseOffsetMethod = kTRUE;
void TMVAClassificationCategory()
{
//---------------------------------------------------------------
// Example for usage of different event categories with classifiers
std::cout << std::endl << "==> Start TMVAClassificationCategory" << std::endl;
// This loads the library
bool batchMode = false;
// Create a new root output file.
TString outfileName( "TMVA.root" );
TFile* outputFile = TFile::Open( outfileName, "RECREATE" );
// Create the factory object (see TMVAClassification.C for more information)
std::string factoryOptions( "!V:!Silent:Transformations=I;D;P;G,D" );
if (batchMode) factoryOptions += ":!Color:!DrawProgressBar";
TMVA::Factory *factory = new TMVA::Factory( "TMVAClassificationCategory", outputFile, factoryOptions );
// Create DataLoader
// Define the input variables used for the MVA training
dataloader->AddVariable( "var1", 'F' );
dataloader->AddVariable( "var2", 'F' );
dataloader->AddVariable( "var3", 'F' );
dataloader->AddVariable( "var4", 'F' );
// You can add so-called "Spectator variables", which are not used in the MVA training,
// but will appear in the final "TestTree" produced by TMVA. This TestTree will contain the
// input variables, the response values of all trained MVAs, and the spectator variables
dataloader->AddSpectator( "eta" );
// Load the signal and background event samples from ROOT trees
TFile *input(0);
TString fname = TString(gSystem->DirName(__FILE__) ) + "/data/";
if (gSystem->AccessPathName( fname + "toy_sigbkg_categ_offset.root")) {
// if directory data not found try using tutorials dir
fname = gROOT->GetTutorialDir() + "/tmva/data/";
}
if (UseOffsetMethod) fname += "toy_sigbkg_categ_offset.root";
else fname += "toy_sigbkg_categ_varoff.root";
if (!gSystem->AccessPathName( fname )) {
// first we try to find tmva_example.root in the local directory
std::cout << "--- TMVAClassificationCategory: Accessing " << fname << std::endl;
input = TFile::Open( fname );
}
if (!input) {
std::cout << "ERROR: could not open data file: " << fname << std::endl;
exit(1);
}
TTree *signalTree = (TTree*)input->Get("TreeS");
TTree *background = (TTree*)input->Get("TreeB");
// Global event weights per tree (see below for setting event-wise weights)
Double_t signalWeight = 1.0;
Double_t backgroundWeight = 1.0;
// You can add an arbitrary number of signal or background trees
dataloader->AddSignalTree ( signalTree, signalWeight );
dataloader->AddBackgroundTree( background, backgroundWeight );
// Apply additional cuts on the signal and background samples (can be different)
TCut mycuts = ""; // for example: TCut mycuts = "abs(var1)<0.5 && abs(var2-0.5)<1";
TCut mycutb = ""; // for example: TCut mycutb = "abs(var1)<0.5";
// Tell the factory how to use the training and testing events
dataloader->PrepareTrainingAndTestTree( mycuts, mycutb,
"nTrain_Signal=0:nTrain_Background=0:SplitMode=Random:NormMode=NumEvents:!V" );
// Book MVA methods
// Fisher discriminant
factory->BookMethod( dataloader, TMVA::Types::kFisher, "Fisher", "!H:!V:Fisher" );
// Likelihood
factory->BookMethod( dataloader, TMVA::Types::kLikelihood, "Likelihood",
"!H:!V:TransformOutput:PDFInterpol=Spline2:NSmoothSig[0]=20:NSmoothBkg[0]=20:NSmoothBkg[1]=10:NSmooth=1:NAvEvtPerBin=50" );
// Categorised classifier
// The variable sets
TString theCat1Vars = "var1:var2:var3:var4";
TString theCat2Vars = (UseOffsetMethod ? "var1:var2:var3:var4" : "var1:var2:var3");
// Fisher with categories
TMVA::MethodBase* fiCat = factory->BookMethod( dataloader, TMVA::Types::kCategory, "FisherCat","" );
mcat = dynamic_cast<TMVA::MethodCategory*>(fiCat);
mcat->AddMethod( "abs(eta)<=1.3", theCat1Vars, TMVA::Types::kFisher, "Category_Fisher_1","!H:!V:Fisher" );
mcat->AddMethod( "abs(eta)>1.3", theCat2Vars, TMVA::Types::kFisher, "Category_Fisher_2","!H:!V:Fisher" );
// Likelihood with categories
TMVA::MethodBase* liCat = factory->BookMethod( dataloader, TMVA::Types::kCategory, "LikelihoodCat","" );
mcat = dynamic_cast<TMVA::MethodCategory*>(liCat);
mcat->AddMethod( "abs(eta)<=1.3",theCat1Vars, TMVA::Types::kLikelihood,
"Category_Likelihood_1","!H:!V:TransformOutput:PDFInterpol=Spline2:NSmoothSig[0]=20:NSmoothBkg[0]=20:NSmoothBkg[1]=10:NSmooth=1:NAvEvtPerBin=50" );
mcat->AddMethod( "abs(eta)>1.3", theCat2Vars, TMVA::Types::kLikelihood,
"Category_Likelihood_2","!H:!V:TransformOutput:PDFInterpol=Spline2:NSmoothSig[0]=20:NSmoothBkg[0]=20:NSmoothBkg[1]=10:NSmooth=1:NAvEvtPerBin=50" );
// Now you can tell the factory to train, test, and evaluate the MVAs
// Train MVAs using the set of training events
factory->TrainAllMethods();
// Evaluate all MVAs using the set of test events
factory->TestAllMethods();
// Evaluate and compare performance of all configured MVAs
factory->EvaluateAllMethods();
// --------------------------------------------------------------
// Save the output
outputFile->Close();
std::cout << "==> Wrote root file: " << outputFile->GetName() << std::endl;
std::cout << "==> TMVAClassificationCategory is done!" << std::endl;
// Clean up
delete factory;
delete dataloader;
// Launch the GUI for the root macros
if (!gROOT->IsBatch()) TMVA::TMVAGui( outfileName );
}
int main( int argc, char** argv )
{
TMVAClassificationCategory();
return 0;
}
Author
Andreas Hoecker

Definition in file TMVAClassificationCategory.C.