Provides content fixity checking (using checksums) for bitstreams stored in DSpace software.
The main access point to org.dspace.checker is on the command line via {@link org.dspace.app.checker.ChecksumChecker#main(String[])}, but it is also simple to get programmatic access to ChecksumChecker if you wish, via a {@link org.dspace.checker.CheckerCommand} object.
CheckerCommand is a simple Command object. You initialize it with a strategy for iterating through bitstreams to check (an implementation of {@link org.dspace.checker.BitstreamDispatcher}), and a object to collect the results (an implementation of @link org.dspace.checker.ChecksumResultsCollector}) , and then call {@link org.dspace.checker.CheckerCommand#process()} to begin the processing. CheckerCommand handles the calculation of bitstream checksums and iteration between bitstreams.
The order in which bitstreams are checked and when a checking run terminates is controlled by implementations of BitstreamDispatcher, and you can extend the functionality of the package by writing your own implementation of this simple interface, although the package includes several useful implementations that will probably suffice in most cases: -
Dispatchers that generate bitstream ordering: -
Dispatchers that modify the behaviour of other Dispatchers: -
The default implementation of ChecksumResultsCollector ({@link org.dspace.checker.ResultsLogger}) logs checksum checking to the db, but it would be simple to write your own implementation to log to LOG4J logs, text files, JMS queues etc.
The results pruner is responsible for trimming the archived Checksum logs, which can grow large otherwise. The retention period of stored check results can be configured per checksum result code. This allows you, for example, to retain records for all failures for auditing purposes, whilst discarding the storage of successful checks. The pruner uses a default configuration from dspace.cfg, but can take in alternative configurations from other properties files.
All interaction between the checker package and the database is abstracted behind DataAccessObjects. Where practicable dependencies on DSpace code are minimized, the rationale being that it may be errors in DSpace code that have caused fixity problems.