Applications of ML in Disaster Victim Identification — A Introduction to the Bonaparte DVI
Machine Learning is a crucial concept that can be applied to a variety of fields like finance (stock market predictions), health information systems, E-commerce (recommendation systems for customers), etc. There are many successful applications of different parts of Machine Learning, and in this article, we will be looking at the Bonaparte Disaster Victim Identification System.
As quoted in the description of the system, “Bonaparte was commissioned in 2007 by the Netherlands Forensic Institute (NFI) as part of their CBRNe incident readiness program, and has since then been further developed and improved by SNN and its subsidiary SMART Research BV in close collaboration with the NFI.” Bonaparte Disaster Victim Identification System (DVI) is a cutting-edge framework that is designed to identify victims in large-scale disasters like terrorist attacks, natural disasters like tsunamis and earthquakes, etc.
Bonaparte has been used in several incidents internationally. It has been used by the Australian Police to match DNA profiles across Australia’s state and territory borders for law enforcement, and disaster victim identification. It was also used in the identification of victims in the air disaster in Tripoli in the year 2010, as well as the crash of the Malaysia Airlines flight MH17 in Ukraine in the year 2014. The government in Vietnam also utilized the Bonaparte for identifying victims in the Vietnam War by signing a contract with Smart Research BV. It was used as part of a 10-year project named “Project 150” initiated by the Vietnamese Prime Minister Nguyen Tan Dung, which aims to identify at least 80,000 of the 650,000 unidentified victims from the Vietnam War.
How does it work?
Bonaparte performs a task of identification, by matching unidentified victims with reported missing people. When the victim cannot be confidently matched with any missing individual, the DNA profiles of the relatives of the missing people are used to perform the identification procedure.
Bonaparte uses Bayesian networks to perform the identification of the victims based on family relations (also known as a pedigree). These networks are used to statistically model the relations between family members based on the genetic material. Several factors are considered in the statistical analysis like silent alleles, missing data, uncertainty in family relations, probability of errors in measurement, etc. These networks also provide additional flexibility in developing the final model for identification. Bonaparte capitalizes on these advantages by automatically generating these networks in its computation engine.
The computational task handled by the Bonaparte is the computation of the likelihood of two hypotheses, using a probabilistic model designed as a Bayesian network. The final motive is to obtain a likelihood ratio. The final Bayesian network consists of alleles and DNA profiles of individuals in a given pedigree and it is issued to compute the likelihood of a given set of matching DNA profiles.
Features of the Bonaparte DVI
a) Connectivity and data integrity
Bonaparte manages data using SQL databases. Data is imported as Excel files or through XML imports. XML imports allow data from any type of databases like PostgreSQL, Excel or plain text to be imported. The imported data is then validated against rules in XML Schema files to prevent data corruption. The web browser-based access also allows an external application to connect via the HTTP protocol.
b) Rewind and history tracking
Bonaparte allows the users to see historical (older) versions of the database. This allows comparisons to re-confirm previously deduced matches between the unidentified victims and reported missing persons. It also enables the users to investigate the data available at a different point in time and draw conclusions that can help in accelerating the identification process. Multiple versions also allow the users to keep a complete history of editing of the data, time of modification, and other details.
c) User concurrency and access rights
Each user accessing a project has his/her own private branch, which enables them to work with the same data without modifying the data being accessed by the other users. Once a user is confident of the updates he has made in the data, he/she can publish it to make it visible to the other users. Access rights for the data are similar to those in Unix and can be defined for every user.
d) Crash recovery
Bonaparte is a client-server based system; hence it is sensitive to network service interruptions or browser crashes. The crash recovery mechanism is used to minimize the loss of updates in data, by automatically saving all the users’ data to a private branch. On the users ‘end, the user simply needs to restart their web browser and resume working at the point where they were interrupted.
e) Client-server architecture
Since Bonaparte is a client-server system, the part where computations are handled and the part where data is stored run on a dedicated server, and the clients communicate with the server over a network. Since the computation itself happens on the server, the clients do not need to invest in any expensive hardware on their end. To increase the computational power, the users can combine multiple servers along with a load balancer.
Bonaparte provides the following advantages in performing the identification of a large number of victims:
Bonaparte uses the relations between family members in the form of Bayesian networks, which are very well documented and accessible to the end-users. These statistical models are used for likelihood computation.
Since Bonaparte is an automated system built upon mathematical principles, there is very little room for error, as compared to the same task performed by multiple human beings.
Typically, when we compare the victim’s profile with the list of reported missing persons, there are several combinations we can consider as possibilities of being a successful match. The problem is further complicated when the victim can be identified only by referring to family relations (also called pedigree). This is explained very well in the official documentation for Bonaparte.
Consider a case with 10 victims with their 10 putative pedigrees. This results in just 100 combinations, but 100 victims with their 100 pedigrees yield 10,000 combinations. For the latter case, Bonaparte’s computation time is in the order of minutes.
d) User-Friendly methodology
Any tool’s impact is always greater if it is very user-friendly. Bonaparte implements the same principle through its Graphical User Interface (GUI). Data is organized in folders with respective project delineations. Each project contains all the information of the individuals and pedigrees of that specific incident. GUI features like drag-and-drop editors and web interfaces through web browsers also enhance the user experience of this tool.