Data Fusion and Analytic Techniques (DFAT)

Apply multi-source data fusion and analytic techniques to detect nuclear proliferation activities.

  • The faculty affiliated with DFAT are Alyson Wilson (thrust lead), Raju Vatsavai, and Hamid Krim.
  • The students are Kenneth Tran, Sally Ghanem, Stephen Ranshous, Nick Meyer, James Gilman, and Jordan Bakerman.
  • The postdocs are Karl Pazdernik and Han Wang.


Challenges of CNEC Thrust Areas

– DFAT Thrust Lead
The challenge problem for the Data Fusion and Analytic Techniques (DFAT) thrust area is to detect and characterize proliferation events and proliferation enterprise networks. DFAT focuses on the application of data science to nonproliferation problems. As examples of work on the challenge problem, consider these three projects. The first is a case study focused on the development of a principled information and sensor fusion framework using data from 2013 flooding in Boulder, CO. The multi-modal data include Landsat-8 satellite imagery, geo-tagged Twitter data, and detailed ground truth from the city. The goal is to fuse data from these initially incompatible sources. The second project, developed with LANL, focuses on event enrichment, or finding additional relevant information from social media. The first research developed a hybrid method for imputing geotags with quantified uncertainty; the extension is considering dynamic generalized linear model variable selection where features are selected from social media data with imputed geotags. The third project, developed with PNNL, is focused on new-age proliferation finance. With the advent of, and subsequent growth of, cryptocurrencies such as Bitcoin, a vast array of new money laundering and transfer avenues are available for proliferators. The project has developed a formal mathematical model for these networks and has focused on the characterization and classification of exchanges (the entrance and exit from the Bitcoin world), and attempting to identify potential laundering patterns.

One important problem in nonproliferation is the combination of multiple sources of information. These data sources are typically multimodal and may include multiple types of sensor data, still, video, and hyper-spectral imagery, satellite data, output of models and simulations, unstructured text, and internet traffic. The broad class of problems that we consider in DFAT is how to combine the heterogeneous sources of information to answer questions and support decision making. Some current DFAT projects include:

  • Determining the origin of uranium materials is of great interest in forensic science. Here we seek to link the individual uranium oxide processing steps (precipitation temperature, calcination temperature, pH of precipitate, urea concentration, precipitate method, and initial uranium source) to the chemical features preserved within uranium samples. These signatures may assist in determining the provenance and history of unknown samples.
  • When in pursuit of an intelligent adversarial target, overcoming the evasive behavior requires collaboration between pursuit coordinators to maximize utilization of available resources. We model the pursuit of an evader as a multi-agent partially observable Markov decision process. Each agent receives incomplete and noisy information at irregular and infrequent intervals regarding the whereabouts of the other agents in play. We estimate a pursuit strategy using a variant of Nash Q-learning which optimizes against a rational target to provide a sensible strategy.
  • Twitter has grown to 300 million active users monthly and more than 500 million tweets daily since its inception in 2006. Only about one percent of tweets are geotagged with latitude and longitude coordinates. We are developing methods to leverage both tweet content and friendship network information to predict user location and quantify the uncertainty. We are then extending current methods using geolocation as a predictive feature to incorporate uncertainty.
  • Detecting and forecasting civil unrest events is an example of how multiple sources of information can be fused to improve predictions. We consider protest dynamics in six Latin American countries on a daily level from November 2012 through August 2014. The models contain predictors extracted from social media sites (Twitter and blogs) and news sources, in addition to volume of requests to Tor, a widely-used anonymity network. Two political event databases and country-specific exchange rates are also used. We are extending results from event and anomaly detection to determine dynamically how important predictive features changes.
  • Proliferation support networks are continuously seeking out and exploiting weaknesses in financial systems to carry out transactions and business deals. Proliferators use a range of sophisticated schemes to obfuscate their activities; however, existing case studies do not enable us to identify any single financial pattern uniquely associated with proliferation financing. Bitcoin and other forms of digital currencies raise a number of new challenges for the detection of money transfer and laundering, especially across international borders. In this work we propose a new type of directed hypergraph model, which integrates relational, temporal, and identity information. With this model, numerous path-based patterns for detecting anomalous behavior are possible. Our key intuition is to focus on exchanges, as they are the entry and exit point to the Bitcoin network, and thus a necessary component of any money transfer or laundering.
  • We consider the problem of estimating hazard extent using multiple data modalities. We propose a maximum entropy approach to fuse disparate data modalities such as environmental feature data and on ground observation data. We focus on the specific problem of flood extent estimation. In our study, the environmental features entail flood descriptors coming from satellite imagery as well as geoinformation. By incorporating several social media sources as our observation data, we can extrapolate flooded regions, thereby raising the assessment of the hazard scenario.




Research Participants

[table id=4 /]