Data collection is handled primarily through the DataRecorder object which can record data from a variety of sources and write that data out in tabular format to a file. The Enn (Endogenous Neighborhood) demonstration simulation (in repast/demo/enn) has a good example of how to use a DataRecorder object and should be explored in conjunction with this document. In addition to the DataRecorder class, some other options for data collection are described at the end of this document.
Using a DataRecorder is a matter of creating it, adding data sources to it, and then scheduling records and file writes. The following is an example of creating the DataRecorder variable recorder. Creating a DataRecorder looks like:
recorder = new DataRecorder("./data.txt",
this);
or to add an optional header comment to the output file:
recorder = new DataRecorder("./data.txt", this,
"A Comment");
The first argument is the name of the file to write the data out to, the second is the model associated with the recorder (and so the argument is "this" given that the recorder is created within the model). In the second example, the final argument is an optional header comment. Once constructed the DataRecorder queries the model to determine if the simulation is running in batch mode. If so, the data recorder will write all the data to a single file keeping track of the run number and so forth.
Creating and adding data sources to the DataRecorder tells the DataRecorder where to find the data to record. The DataRecorder will record the data from these sources when it is sent the record() message, usually by the scheduling mechanism. The source of data for DataRecorder is typically a method call on some object, or the value of some instance variable. The actual sources for data can be of two types: numbers (ints, floats, longs, doubles) and objects (Strings etc.).
In order for the DataRecorder to know how to work with the wide
variety of possible data sources, these method calls or instance
variables must be wrapped in one of two interfaces - NumericDataSource
or DataSource, where NumericDataSource is for numeric datasources and
DataSource is for anything else. The NumericDataSource
interface has a single method: public double execute(),
and the DataSource interface also has a single method:
public Object execute(). Wrapping your data source in one
of these interfaces entails creating a class that implements one of
the above interfaces and returning the data you wish to record from
the execute method. Like BasicActions, you can create these classes by
hand or have Repast dynamically create them for you. An example
follows:
public class MyModel extends SimModelImpl {
private int numAgents;
private Space space;
private DataRecorder recorder;
...
class NumDataSource implements NumericDataSouce {
public double execute() {
return numAgents;
}
}
class ObjDataSource implements DataSource {
public Object execute() {
return space.getData();
}
}
private void buildModel() {
recorder = new DataRecorder("./data.txt", this);
recorder.addNumericDataSource("numAgents", new NumDataSource());
recorder.addObjectDataSource("SpaceData", new ObjDataSource());
...
}
}
Here, we have implemented two data sources as inner classes to our model. The first, NumDataSource, merely returns the value of the numAgents ivar. The second, ObjDataSource, calls a method on some hypothetical Space object that returns an Object for recording. We then add these DataSources to the recorder in the buildModel() method. Having added the DataSources, the execute() methods of these DataSources will be called whenever we call DataRecorder.record(). (You could also use anonymous inner classes here in place of teh explicit inner classes ObjectDataSource and NumDataSource.)
If you wish Repast to create a class that implements one of these interfaces for you, the code looks like:
public class MyModel extends SimModelImpl {
private int numAgents;
private Space space;
private DataRecorder recorder;
...
public int getNumAgents() {
return numAgents;
}
public String getSpaceData() {
return space.getData();
}
private void buildModel() {
recorder = new DataRecorder("./data.txt", this);
recorder.createNumericDataSource("numAgents", this, "getNumAgents");
recorder.createObjectDataSource("SpaceData", this, "getSpaceData");
...
}
}
Here, we create the DataSources and add them with method calls to the DataRecorder object. The create* methods of the DataRecorder takes a label for the data to be recorded, an object reference, and the name of the method to call on that object that will return the data. So, in the above, Repast creates a NumericDataSource object whose execute method calls getNumAgents(). The createObjectDataSource works in a similar way. The results of either method of creating DataSources, by hand or dynamically, are comparable. However, with dynamic creation such errors as mis-spelling the method name argument in the create* call will not be caught until runtime.
For NumericDataSources you can also specify the number of digits both
before and after the decimal point that you wish to record. So for
example, recorder.createNumericDataSource("numAgents",
this, "getNumAgents", 3, 4) will record 3 digits
before the decimal point and 4 after. If the DataSource produces
3333.22222222, the value 333.2222 will be recorded. An argument of -1
will record all the digits, so
recorder.createNumericDataSource("numAgents", this,
"getNumAgents", -1, 4) will record 3333.2222. Note
that the number is not truncated, but rounded according to standard
rules for rounding floating point numbers. See the API documentation
for more info on the DataRecorder class.
The actual data collection via the record() method and writes of the recorded data to a file via writeToFile() are scheduled just like any other methods to be scheduled. For example,
schedule.scheduleActionBeginning(0, new BasicAction() {
public void execute() {
dRecorder.record();
}
});
schedule.scheduleActionAtEnd(dRecorder, "writeToFile");
tells the scheduling mechanism to call the record() method on the DataRecorder variable dRecorder every tick beginning at 0. As mentioned above dRecorder then queries its data sources in the manner described above and records the result. The above also tells the scheduling mechanism to call the "writeToFile" method on dRecorder. dRecorder then writes the collected data to the file specified when the dRecorder was constructed, appending the data if data has already been written during this run. The output file begins with a header giving model parameters for this run, followed by the data in tabular format. For example
RngSeed: 948816295132
numAgents: 100
maxAge: 120
"tick","Avg. Age"
0.0,33
1.0,33
2.0,32
3.0,40
...
The output file in a batch run is slightly different. There, any constant parameters are recorded in the header as is the starting time of the batch run. Any dynamic parameters as defined in the batch parameter file are recorded in the body of the table.
You can also change the delimiter for the recorded data with the setDelimiter(String val) method. The default delimiter is "," and so by default recorded data will look something like that above. Changing the delimimter to a ":" for example, will result in:
RngSeed: 948816295132
numAgents: 100
maxAge: 120
"tick":"Avg. Age"
0.0:33
1.0:33
2.0:32
3.0:40
...
Note that data is stored in memory until the writeToFile call at which point it is flushed. If you are experiencing memory problems, write your data to a file more frequently.
Additional Ways to Collect Data
In addition to the DataRecorder there are several other options for
recording data from a Repast simluation. For the ways to record
network data, see Network Models.
Recording data via an ObjectDataRecorder
is described below.
An ObjectDataRecorder records arbitrary objects to files as Strings together with an optional comment. Typically, this comment will be the tick count, but need not be. An ObjectDataRecorder is constructed much like a DataRecorder object with a fileName and associated model. Unlike a DataRecorder an ObjectDataRecorder has no data sources, and records an arbitrary object when instructed to do so via its record() method.
In non-batch mode, a file header will be written that includes all the model's properties and the values of these properties. These values will be the value at the time the first call to write is made, allowing the user to manipulate the parameters after setting up but prior to running the model. Consequently, your initial model parameters should only be changeable via the user or through a parameter file. If not, the file header may be inaccurate. The actual data will be written as a block where the block includes the optional comment as a header, followed by the String representation of the object itself. This String representation is acquired via the object's toString() method.
In batch mode, any constant model parameters will be written in the file header. The data block will include the value of any dynamic parameters, the run number, as well as the optional comment (again, typically the tick count), together with the String representation of the Object to be recorded.
Data is recorded by an ObjectDataRecorder via its record() method and written to a file via its write() method. See the API docs for more information.