The script in focus for this topic is AssembleReportCSV.py.
Typically, simulations reproduce a few different scenarios that should be compared. For example the results of a control group need to be compared to the results of an intervention group in a simulated clinical study. Once results are available, the user will want to see the results near each other on the same report using similar terminology. Alternatively, a user may want to compare simulation results to the actual results obtained from a clinical trial. Also, the user may just want to narrow down the amount of information from a single CSV report file to compare specific time frames and stratifications in a certain order from a much larger list.
The system provides some support to accommodate such comparison and visualization through the AssembleReportCSV.py utility.
The AssembleReportCSV.py utility assumes that MultiRunSimulationStatisticsAsCSV.py created summary simulation reports as CSV files. And these files are to be combined to a single file that compares specific columns from those CSV files, and possibly includes reference columns from other files with a similar format.
The script is always invoked from the command line in the following format:
python AssembleReportCSV.py AssemblySequence OutputFileName
- AssemblySequence is an elaborate structure that allows the user to select specific columns from specific input files in a specific order. The assembly sequence will be of the form [ ColumnTuple1, ColumnTuple2, ...]. The user can specify this sequence within double quotes in the command line, or place it in a text file and place the filename as a command line parameter instead. Each member in the assembly sequence is a tuple enclosed in parenthesis of the form (Filename, Key1, Key2, Stratification, Title) where:
- FileName is the CSV filename from which to extract the column within quotes.
- Key1: The start step of the interval of interest. This information is required and should be enclosed in quotes.
- Key2: The end step of the interval of interest. This information is required and should be enclosed in quotes.
- Stratification: This is an optional parameter that can be skipped or omitted by specifying an empty string. Otherwise, is allows specifying a stratification cell of interest by string. The string should match exactly the stratification string in the CSV report that starts with 'Stratification -' and should be enclosed in quotes. This information allows the system to select a specific column by the stratification cell. If skipped, then the time intervals from the first stratification cell encountered will be used.
- Title: An optional parameter that can be omitted. If specified as a string in quotes, this string will be used as the column title. This allows the user to specify a title that can distinguish columns textually and give a meaningful explanation of the column and therefore recommended.
- OutputFileName is the name of the output CSV file where the collected columns will be placed.
The report generated is very similar to previous CSV reports with the difference that it can extract columns from multiple files and provides a title for each such column. So the output file contains the following information for each column: user specific title, the file name from which the column was extracted for reference, the stratification requested by the user, the project name that generated the results, the model name used in the project, the population set name used in the project, start step of interval, end step of interval, many rows with parameter statistics, repetitions count.
To make the report readable it is recommended to extract the first two header columns by including the following tuples in the beginning of the sequence: ('FileName','',''), ('FileName','Start Step','End Step'). Note that this assumes that <Header> was selected as the first parameter in the report options file, which is the default.
Here is an example that builds again on the simulations we conducted using MultiRunSimulation.py and on reports we created using MultiRunSimulationStatisticsAsCSV.py beforehand.
Type in the following command:
python AssembleReportCSV.py "[('Testing_Mean.csv','',''), ('Testing_Mean.csv','Start Step','End Step'), ('Testing_0.csv','0','0','','Simulation 1 result'), ('Testing_1.csv','0','0','','Simulation 2 result'), ('Testing_2.csv','0','0','','Simulation 3 result'), ('Testing_Mean.csv','0','0','','Mean of 3 simulations') , ('Testing_STD.csv','0','0','','STD of 3 simulations'), ('Testing_0.csv','1','1','','Simulation 1 result'), ('Testing_1.csv','1','1','','Simulation 2 result'), ('Testing_2.csv','1','1','','Simulation 3 result'), ('Testing_Mean.csv','1','1','','Mean of 3 simulations') , ('Testing_STD.csv','1','1','','STD of 3 simulations'), ('Testing_0.csv','2','2','','Simulation 1 result'), ('Testing_1.csv','2','2','','Simulation 2 result'), ('Testing_2.csv','2','2','','Simulation 3 result'), ('Testing_Mean.csv','2','2','','Mean of 3 simulations') , ('Testing_STD.csv','2','2','','STD of 3 simulations'), ('Testing_0.csv','3','3','','Simulation 1 result'), ('Testing_1.csv','3','3','','Simulation 2 result'), ('Testing_2.csv','3','3','','Simulation 3 result'), ('Testing_Mean.csv','3','3','','Mean of 3 simulations') , ('Testing_STD.csv','3','3','','STD of 3 simulations')]" Testing_Out.csv
This example demonstrates the use of this script to compare the results from each of the 3 simulations at all 3 years near each other. It also compares those to the Mean and STD statistics extracted for those 3 simulations.
Note that the user can specify a reference CSV file that can be used to include specific columns. Also note that the system will not check if the rows match, it just selects columns from multiple files and assembles those together. It is up to the user to make sure the columns and their definitions match between files. With good organization of the data, CSV reports can now be read by human or reused to create graphical plots as described hereafter.