Categories: “Computer Science

Reference #: 2014-021

OTC Contact: Zeinab Abouissa,  Phone: 202-687-2702, Email: zaa9@georgetown.edu


The amount of digital information on global networks is increasing exponentially and the demand for processing large amounts of digital data in real time is particularly heightened especially for identifying emerging risks and threats, particularly in the area of national security.

At present, a lot of surveillance focuses on “horizon scanning”, which is often human-based schemes for monitoring both proprietary and open source data streams thought to be relevant to known or unknown risks or threats. While these conventional methods are the norms, they are often inefficient and cannot identify surprises, latest developments, or novel plots because these searches rely on a human conceived and a defined set of interests or knowledge that a computer-aided search treats as a prior knowledge. This pre-set boundary limits the capability of a search to detect and identify unexpected events. There are no true “Big Data” approaches to threat surveillance that are capable of avoiding surprise or previously unknown threats, because:

• They run the risk of not identifying surprises since, by definition, surprises do not occur frequently and are therefore unlikely to be considered as an interpretation of observed data
• Keywords searches look for something specific
• Machine classifiers are trained on the familiar
• Logistic regression looks for risk factors of predefined, desired outcomes

Accordingly, there is a need for an improved system to identify relevant hypotheses in data, including surprising hypotheses, and to recognize known and emergent event signatures and enable human and/or machine event recognition of safety and related events.

Researchers at Georgetown University Medical Center Division of Integrated Biodefense along with Department of Computer Science have developed a novel turnkey, automated system to recognize threats earlier than any current state of the art thus enabling real time or near real time surveillance of massive amounts of data. This method allows the data itself to define a space of possible hypotheses, which optionally merges and groups similar hypotheses, and then weighs and selects a subset of relevant hypotheses for further consideration.

The system thus has the ability to:
• Monitor the internet and other data continuously
• Discover hypotheses potentially explaining data
• Detect leading indicators of food threats and using these as signatures of nascent events 􀀀 Notify users of identified and potential threats

The system enables human and/or machine event recognition by analyzing data to construct one or more qualitative metrics, establishing a baseline for the qualitative metric(s), identifying additional data over time, identifying an updated baseline, and outputting the adjusted baseline for display to the user.

This system thus identifies known signatures of threats in massive data sets and identifies hypothesis that can explain observed data to identify unknown threats. Rather than bringing an a priori conceived hypothesis, it lets the data itself defined a ranked set of possible hypothesis. This approach to hypothesis generation in security is:

• Applicable to threats broadly defined (e.g., food fraud/adulteration, chemical and biological poisoning, supply chain vulnerability, etc.)
• Applicable to textual (and potentially other) data
• Executable in real time or near-real time and scalable to large applications
• Has sound theoretical basis that users can understand and trust


Ophir Frieder, Ph.D.
David Hartley, Ph.D.


US Patent No. 10,521,727
US Patent Application No. 15/242,325
US Patent Application No. 16/663,547
US Patent Application No. 17/059