January 24, 2024
Note: The information contained is over 20 years old. The document is intended to aid startups seeking to do their first SBIR. We find that knowing what the end product looks like is helpful to getting underway.
1 Identification and Significance of the Problem or Opportunity
1.1 Introduction
ACME Dynamite Company (ACME), together with consultant Speedy Gonzalez, is pleased to submit this proposal to develop the Basic Line Support (BLS) in response to the OSD SBIR Topic #: OS1234-SP6, entitled “Wiki for Page Alerting. ”
The background of our team is particularly appropriate for the task of developing a system to support the objectives of the Office of the Secretary of Defense. The Principal Investigator, Mr. Road Runner, is an award- winning software developer at ACME with years of experience conducting research and developing software for knowledge engineering, semantic technologies, and visualization. Mr. Speedy Gonzalez provides significant experience in commercializing SBIR technologies. ACME personnel have developed award-winning software solutions for the Federal Intelligence Community, including software designed for semantic reasoning, social networking, knowledge engineering, and data fusion.
1.2 Identification of the Problem
As a component of the Network-Centric Warfare (NCW) doctrine, the Global Information Grid (GIG) links many information assets together, providing unprecedented information to today’s warfighter. Ideally, the GIG will promote information sharing and dissemination of critical information. Modern warfighters would no longer suffer the fog of war; instead, they would gain an improved situational awareness through seamless communications. Warfighters would receive information from each other, commanders behind the lines, and external detection devices such as overhead satellites or automated reconnaissance aircraft.
Unfortunately, while current technologies can easily link information sources together, generating topical content from those information sources remains a challenge. Currently, a warfighter or analyst must scan the various information sources to determine their topical meaning. However, the ever-increasing volume of information prevents any individual from gleaning the information from all available data. Modern filtering techniques offer some relief, but the warfighter still must synthesize his understanding of a tactical situation. Since different people process information differently, each warfighter may understand different aspects of the same situation, leading to a fog-of-war even in today’s information-rich military. The lack of accurate, up-to-date, and easily understandable information leads to collateral damage and other unintended consequences, such as the 1999 bombing of the Chinese embassy in Belgrade.
Ideally, a warfighter would receive the information they need in a timely fashion, receiving alerts for their current tactical situation. At the same time, the mission commanders and analysts need to comprehend how any new information might pertain to the context of the current tactical situation. The mission commander must remain aware of the current theatre of operations, the status of any targets, and external factors such as weather. Analysts who support tactical operations also must monitor other factors, such as the changing structure of an enemy organization as it reacts to military actions or other external situations. All of these people need common representations for the tactical situation.
Other organizations confront the same challenges. Intelligence analysts contend with vast amounts of information from various sources, hoping to discover hidden nuggets of information. Financial analysts must monitor stock transactions, economic news sources, and external factors that may influence a stock price. Firefighters must remain aware of the extent of a wildfire, current weather conditions, and the location of any nearby homes. Field medics must correlate a patient’s current medical status with their medical history and current medications. For all of these challenges, processing the tremendous volume and variety of available information will demand automated processes.
Fortunately, new technologies for automated reasoning have emerged from recent efforts within the semantic web community. As the semantic web evolves, researchers develop new techniques and utilities for semantic reasoning. New and evolving standards for the semantic web include key technologies such as Resource Descriptor Framework (RDF) for representing nuggets of information, Web Ontology Language (OWL) for formal ontologies, and the SPARQL Protocol and RDF Query Language.
Automated semantic reasoning, if applied appropriately, could drastically improve analytic processes. By automatically determining the vital nuggets of information and representing them within the proper context, these reasoning processes could provide timely, relevant information to the warfighter and analyst alike. However, the nascent field of semantic reasoning involves substantial complexity, and remains a risky endeavor for early adopters.
At the same time, the World Wide Web has evolved new information repositories. Collaborative wikis, such as Wikipedia, have become pervasive. Researchers and analysts regularly consult Wikipedia for information. Other specialized wikis offer information about focused topics. These wikis provide a convenient means to represent the current knowledge available about particular topics, and users regularly update wiki pages as information changes. Even the intelligence community (IC) created Intellipedia as a wiki for classified intelligence information.
The convergence of semantic reasoning and collaborative wikis promises a variety of capabilities to help the warfighter and analyst.
1.3 The Opportunity
The intent of this Phase I SBIR is to research and develop a system that will extract relevant content from data sources, extract semantic meaning, update the appropriate wiki pages, and issue alerts based upon the updated content.
The system will use an OWL ontology, or set of OWL ontologies, to define which specific semantic content it should derive. Each of the statements within the ontology will correspond to syntactic expressions, and the system will use this correspondence to derive RDF triples that represent specific semantic information. The system will then use these RDF triples to update the wiki pages.
The BLS will include automated SPARQL queries to derive additional contextual information, such as social networks, geographic references and maps, or temporal representations such as timelines. These augmented representations will supply the users with necessary information that corroborates the textual information contained in the wiki pages. For instance, one query might retrieve all people who have associations with a particular target to construct a social network, while another might derive the hierarchy of an organization. While traditional wikis might contain a social network representation on a static page, the BLS will generate such information as needed from the active wiki pages. This removes the danger of stale information, so the warfighter will have accurate information at all times.
Finally, the BLS will transmit alerts to the wiki users when page contents change. Each user will identify specific pages and select from a set of alerting rules for those pages. When the contents of a wiki page changes, the system will determine which users it must alert. It will transmit an alert to any user with a matching alert rule for that page. Because the user might be a deployed soldier with only a PDA or a cell phone, the system must be able to transmit this information to different devices, regardless of the characteristics of the device.
The proposed BLS will blend the Semantic Web with the Social Web. It ideally mixes the collaborative understanding of a wiki with semantic reasoning and intelligent alerting. By intelligently applying automated reasoning processes to incoming data, using semantic wiki pages for information storage and representation, and applying our innovative alerting system, ACME offers a compelling system for integration and dissemination for today’s warfighter.
1.3.1 Contractor Experience
ACME’s commercialization consultant, Mr. Speedy Gonzalez, has successfully developed technology for many SBIRs during his career. Mr. Gonzalez’s Mini-Telecommunications Demarcation System (MTDS) may be found in all civilian and most military air traffic control facilities in the United States. Mr. Gonzalez is a 2007 winner of the Tibbetts Award for SBIRs and is an honoree of the National Inventor’s Hall of Fame. His expert advice provides Team ACME a valuable advantage for identifying and securing the project’s Intellectual Property, and exploiting the commercial potential of the BLS.
The principal investigator, Mr. Road Runner, has developed numerous software systems and prototypes. He has also conducted innovative technical research, presenting results at technical conferences and in academic journals. He is an expert in visualization, knowledge representation, and semantic reasoning. He recently designed a utility for analyzing a social network for the VULCAN challenge, winning an award for Effective Toolkit Integration. He also researched and developed a library of software for combining techniques for visualization, automated semantic reasoning, and data fusion. He presented his findings at the Knowledge-Assisted Visualization (KAV) workshop at the IEEE conference in 2008, and released this software to the Open Source community as the Prajna Project.
ACME, an ISO9001 certified small disadvantaged business (SDB), is an Information Technology Solutions company that provides innovative technological services to support Federal Intelligence clients. ACME offers services in the core areas of software engineering, systems engineering, management consulting, and technology integration. Our specialty is delivering comprehensive, best of breed solutions for our customers by leveraging our domain expertise and established partnerships with commercial vendors and strategically aligned service- providers. ACME recently graduated from the Small Business Administration’s (SBA’s) 8(a) Program.
ACME has developed, integrated, tested and deployed numerous systems using agile methodologies to ensure that we meet the immediate needs with a level of quality and response that is required by the customer. ACME use of agile methods enables its developers to integrate multiple sources and systems, adjust and respond quickly to customer mission changes, and provide the ability to anticipate problem areas and identify solutions to avoid delays in development and implementation. ACME has included the integration of COTS, GOTS, and FOSS software to satisfy customer requirements.
1.3.2 Innovative Approach Overview
The BLS includes several innovations and applications of cutting-edge technologies. The BLS will use a semantic wiki to store textual/syntactic information and semantic information. An associated ontology, written in OWL, will incorporate the knowledge relationships and structures that the users need. In addition, an experimental set of syntactic-to-semantic rules, matching the rules within the ontology, will provide some ability to extract semantic information from plaintext. Pages within the semantic wiki will include tagged references that the system can update as it extracts new information. The overall design of the BLS system appears in Figure 1.
Wikipedia defines a Semantic Wiki as a "Wiki that has an underlying model of the knowledge described in its pages. Regular, or syntactic, wikis have structured text and untyped hyperlinks. Semantic wikis, on the other hand, allow the ability to capture or identify information about the data within pages, and the relationships between pages, in ways that can be queried or exported like database data. ”
Within a normal wiki, links to other articles simply take the user to a different page, and have no additional information. The semantic wiki embeds additional semantic information as part of the link. For instance, a normal wiki page on John Doe may include the following prose:
In this example, the concepts “Joe Doe, ” “Mary Doe ” and “May 13, 1980 ” all refer the reader to other pages. A human reader understands intuitively that Joe Doe is John Doe’s father. However, a computer system cannot glean that information from the prose. A semantic wiki can enrich these references with specific semantic content:
In this case, the semantic meaning of the date and related people are clear. The page incorporates RDF triples (e.g., “JohnDoe hasFather JoeDoe ”) in a prose form that both humans and software agents can understand. With an appropriate semantic query, an application could generate an entire family tree from a set of wiki pages. Similarly, a semantic wiki could generate lists of people who have an association with an organization.
Figure 1: High-Level Architectural Diagram of the BLS, showing different architectural components and process flow. This design shows how the warfighter will interact with the semantic wiki pages and receive alerts.
Semantic wikis also include a page categorization. A traditional wiki page cannot automatically identify whether its page topic is a person, place, organization, or event. However, the semantic wiki can incorporate a categorization, based upon the underlying ontology for the wiki. This feature allows the system to quickly search or identify particular pages of interest.
In addition to using the semantic wiki to present the data, the BLS will exploit the semantic information contained within the pages. By specifically encoding the semantic information within the link, the system can update the information automatically. For instance, assume the wiki page for John Doe includes the following information:
Sometime later, the system receives new information. The extraction process determines that John Doe started working for a different supervisor, represented by the RDF triple “JohnDoe hasBoss BobJones. ” Without a semantic reference, the only way to update the wiki page would be to replace all occurrences of JaneSmith with BobJones. However, JohnDoe might associate with Jane Smith in other ways, so this approach would fail. However, since the reference to Jane Smith includes a semantic reference, the BLS can identify what sections of the wiki page it should update.
Additionally, the system will include SPARQL searches across multiple pages, enabling a user to identify related pages even if they are not directly related. It will also use the semantic information within the links to derive structures such as timelines, social networks, or hierarchies. For instance, a SPARQL query could retrieve all “hasBoss ” relationships where the employee works at AcmeCorp. The system could then take this information and generate an organizational hierarchy of AcmeCorp, and represent the hierarchy with an organizational diagram. The wiki user could view these hierarchical trees or social networks in a graphical format.
For the alerting component, ACME will utilize a software system that we previously developed under another SBIR for the Air Force. This system, known as the Information Dissemination Platform (IDP), provides alerts to multiple devices with different capabilities, including laptops, cell phones, PDA’s and pagers. By interfacing with and exploiting this technology, the BLS can easily fulfill the alerting needs of the warfighter.
The design of the entire system will use a cohesive object-oriented design and modular components. The extraction components will incorporate adapters for each type of data. This separates the data parsing process from the actual extraction. The Rich Content Generator will use general RDF and the wiki pages themselves as content sources. Finally, the system can use different ontology sets, so the system could create content for different contextual wikis. This approach blends the concepts of object-oriented design with new semantic resource-driven architectures.4, 10
The high-level architecture design, shown in Figure 1, incorporates the modular approach to software design. Each of the various modules will use interfaces to communicate, increasing the flexibility of the system.
2 Phase I Technical Objectives
Phase I SBIR projects are for determining system requirements and configuration, assessing design and development feasibility, and planning for the Phase II development. The technical objectives ACME has chosen for Phase I will provide the government with a complete assessment of the potential utility of the BLS.
ACME will first develop a detailed understanding of specific user needs, and identify any additional features that the BLS system should incorporate. ACME will also identify the characteristics of the data available to the BLS. Next, ACME will develop detailed reference ontologies and models, based on the team’s knowledge of semantic research, consultation with the OSD, and the characteristics of the available data.
Then, based on the data ontologies and available technology, ACME will implement extraction techniques to identify important entities from available data using links and context. Next, ACME will extend these extraction techniques to identify semantic concepts that match the ontology definitions.
From the derived semantic concepts, ACME will implement techniques that will update the pages of a semantic wiki. ACME will also implement techniques that derive visualizations, such as maps, social networks, hierarchy trees or timelines, from the semantic concepts.
Finally, ACME will augment the alerting capabilities of the semantic wiki by creating a series of rules for alerting. ACME will integrate the wiki alerting capabilities with the Information Dissemination Platform to provide alerts to warfighters in the field.
The most feasible and effective design will be used to specify a plan for Phase II development. The resulting well-vetted design and detailed Phase II development plan will provide the OSD the information it needs to determine the desirability and feasibility of the proposed Phase II product development. In addition to setting the stage for Phase II, Phase I will yield products that are independently valuable including prototypes for semantic extraction and automated annotation of wiki pages. The specific technical objectives for Phase I are listed in summary below:
Develop detailed, realistic requirements for the BLS
Model the available data within the GIG to create reference ontologies and models using OWL.
Implement extraction techniques for identifying important entities from available data using links and context. The extraction techniques should identify critical entities with high precision.
Develop the prototype techniques to derive or extract semantic content from available data. The system should identify a significant majority of the semantic concepts for each rule defined in the OWL ontologies.
Create techniques for automatically updating the semantic wiki pages. Update or augment available wiki page information by identifying semantic tags within the wiki page and updating them with current information.
ACME will augment the above technical requirements based upon specific information that is available following contract award.
3 Phase I Work Plan
ACME has developed the Phase I work plan to achieve the technical objectives listed in the previous section. The six-month work plan is designed to provide OSD with sufficient design and feasibility information to allow them to evaluate a Phase II effort. The plan also is designed to reduce schedule and technical risks. Figure 2 shows a six-month schedule for this work plan. ACME will reuse components and software already developed on earlier programs to enhance the chances for success. ACME researchers will perform all research and development at the ACME research lab in Ellicott City, Maryland. The details of each task are listed in sections below.
Figure 2: High-level schedule for Phase I research and development tasks
(*assumes that the work would start in Jan 2010). Because of its approach to identifying functional requirements, Team ACME can identify the challenges and develop solutions proactively.
3.1 Task 1: Develop Detailed System Requirements
A detailed requirements document, drafted with careful consideration of needs of the users, forms the basis of all effective system design efforts. This document describes the “vision ” of what the system is expected to do for its users, identifies the high-level use cases and specifies the functions, characteristics, limitations, and operational environment of the new system. The requirements document forms an agreement between the system developers and the ultimate users of the new system. Based on prior experience, ACME will ensure the requirements document will evolve in an iterative fashion through different phases of the project, and as users or developers identify additional needs.
ACME has extensive experience in performing knowledge acquisition (KA) with users, and developing effective requirements documents as part of critical system designs for such clients as NSA and DIA. For the BLS, the ACME team will develop detailed functional requirements based on knowledge gained from users, use cases, and data requirements. The Principal Investigator, Mr. Runner, will analyze and document the user and data requirements. Team ACME will develop a preliminary Process Planning Road Map, and create project-specific planning spreadsheets to define user needs, product features, process features and process control features. Team ACME will use existing use cases, data models and KA information during the requirements and analysis phase to the maximum extent possible.
Team ACME will provide the requirements documents to the OSD COTR for review. We will solicit input for the requirements, and incorporate them before moving further with development. Team ACME will adopt any recommendations, and address any concerns, prior to proceeding to the next task.
3.2 Task 2: Model the Data to Create Reference Ontologies
The GIG data available to the BLS may include plaintext records. It may also include semi-structured text with metadata, such as HTML or XML documents. This data will likely vary in structure, content, and quality. Therefore, ACME will model the actual data and create reference ontologies based upon the data models. ACME will also incorporate user needs into the ontologies, so that the ontologies represent the user’s requirements.
During this stage, ACME will identify open data sources that resemble the GIG data, and use these data sources for testing the system. These data sources may include news feeds or similar ACME will also create an initial set of wiki pages based upon the sample data. These pages will include semantically enriched links, which will incorporate the ontologies.
ACME will design the initial implementation and web pages using the wiki software from the Semantic MediaWiki project. Semantic MediaWiki extends the MediaWiki software used by Wikipedia and other public wikis with semantic constructs and reasoning. ACME will use Web Ontology Language (OWL) for the ontology definitions and Resource Descriptor Framework (RDF) for representing facts about the data. This implementation also ideally addresses scalability concerns, since Wikipedia uses MediaWiki, contains millions of pages, and demonstrates acceptable performance. ACME will include actual performance evaluation as part of the feasibility assessment.
3.3 Task 3: Evaluate Extraction Techniques for Identifying Entities
ACME will identify techniques to extract entities of interest from the data. During this process, Mr. Runner will create metadata tags for the documents that associate the entities with the data. For structured data, the processes will identify particular fields within the data that might typically contain entities of interest. For semi- structured and unstructured records, Mr. Runner will examine the feasibility of applying entity extraction techniques to the unstructured portions of the data.
During this process, the proposed system would augment the information by tagging the important entities within the data. Entity extraction is a well-known process utilized by many systems. The performance of traditional entity extraction engines vary widely based upon their internal models and the textual input. Entity extraction processes also usually include inaccurate entity identification. The performance of some entity extraction engines may fail to meet any real-time performance criteria. Therefore, Mr. Runner will evaluate both traditional entity extraction techniques and custom solutions that utilize the wiki data itself.
Synonyms also confound traditional entity extraction engines. However, the wiki pages can encode synonyms for important entities, typically as a redirected page. ACME will determine how to utilize this wiki feature within the BLS, and implement designs to exploit it.
For particular data sources, e.g., sensor data or overhead imagery, ACME will identify the appropriate tags for the aggregate data as a whole. For instance, if a particular mission uses an unmanned aircraft for overhead imagery, the system will incorporate the current imagery on an appropriate wiki page. Because of the extreme variance in such data sources, ACME will select a sampling of sensor data sources for the initial evaluation.
3.4 Task 4: Evaluate and Design the Semantic Extraction Process
During this phase of development, Mr. Runner will design the processes for dynamically creating or updating wiki pages based upon the semantic content of the available data. When new data records become available, the system will attempt to parse out information that matches the ontology. For structured or semi- structured records, ACME will develop parsers that will extract the RDF triples that match the ontologies. This process will generate RDF triples based upon straightforward rules.
More significantly, Mr. Runner will create a prototype lexical parser to scan unstructured information to glean out semantic content. To facilitate this process, the ontology rules will include common phrases that map to each ontology rule. For instance, an ontology which includes the rule “Person hasFather MalePerson ” could include several different common ways that plaintext can express the Father concept. The parser will use these phrases to parse RDF triples from the plaintext, but will only apply phrases for the established ontologies. By limiting the scope of the parser to specifically search for text that corresponds to a known ontology, the system will achieve greater accuracy and performance in its parsing process.
In addition, ACME will examine and evaluate processes for deriving additional semantic information through inference. For instance, if the system knows the RDF triples “JohnDoe hasFather JoeDoe ” and “JohnDoehasGender Male, ” then it should derive the RDF triple “JoeDoe hasSon JohnDoe ” from the ontology rules. Several reasoning utility packages, such as the open-source project Jena, already provide this capability. During this phase, ACME will identify possible candidates for the ontological reasoning process, and determine how to integrate their capabilities into the BLS.
Parsing and extracting all semantic content from unstructured text remains an impossible goal. However, ACME will explore techniques to glean semantic information successfully on a limited scope. By defining a set of mappings from syntactic patterns to semantic concepts, ACME can develop parsers to extract meaning for a specific ontology.
In addition, ACME will evaluate text summarization utilities during this stage. ACME will explore how the system might incorporate text summarization with RDF extraction to enhance the extracted content. The system could use any text summarization during the wiki update process.
3.5 Task 5: Design the Wiki Update Process
Wiki pages use well-known structures for organizing their information. Certain categories of pages share a common organization. For instance, pages about people might include sections on education, current occupation, and known activities. In the proposed system, each wiki page will also include a designated section for system- generated content. The system will use this system as an area to append new information when it cannot determine where to insert. It may also append any summaries generated from text summarization. Since the system-generated information may contain inaccuracies, experts may periodically review this section, editing and correcting the information as appropriate.
Once the system parses new records and generates RDF triples from the new information, it will compare the generated RDF triples to the set of known RDF triples. The system will identify any new RDF triples, and use them to update the content of the appropriate wiki pages.
On Wikipedia, many wiki pages include an information box on the right side that summarizes some of the most relevant information about the page. These information boxes contain many nuggets of information that naturally correspond to RDF triples. In fact, the Dbpedia project (www.dbpedia.org) parses the information boxes on Wikipedia, derives the RDF triples from the content of the information box, and identifies the category for the page. The BLS will perform the reverse function, creating the content from the RDF. ACME will investigate the Dbpedia extraction process to identify any components or processes that the BLS could incorporate.
Figure 3: Section of the Wikipedia page on Iran. The information box on the right side includes the map, current government leaders (circled), historical information, population (circled) and GDP (circled). The BLS would automatically update these entries if they change.
The system will update any existing semantic references to include the new information. For simple updates, such as a person’s current location or the status of a combat unit, the system might simply replace the old information with new information. For other new information, such as the discovery of a new liaison, the system will append the original prose to the wiki page. If the system cannot determine where to add this information, the system will append it to the system-generated content section.
3.6 Task 6: Explore Processes to Enrich Content
Wiki pages contain more information than text and hyperlinks. Many wiki pages include tables, images, diagrams, and other forms of visualizing information. During this stage, ACME will investigate various ways to generate such content. ACME will determine the feasibility of generating various types of enriched content.
One such enhanced content is a social network. Wiki pages that represent people may also include a social network diagram, generated from the cross-links to other people. When the system determines that the cross-links have changed, it may generate a new social network from the new RDF triples.
Another possible type of enhanced content might be a geographic representation. Wiki pages that represent a geographic entity, such as a city, typically include a map showing its location. Additionally, in a system used by the army, a page about a particular military target might include a map showing the current location of the target along with any hostile forces.
Other types of enhanced content might include timelines or organizational hierarchies. Timelines could show how several events are temporally related. Organizational hierarchies could represent structures like a military order-of-battle.
These various options derive from the current research efforts within the semantic community. With the advent of so many social networking websites such as Facebook or MySpace, many semantic specialists are exploring how to generate social network displays. Similarly, the semantic community is also investigating ways to incorporate geographic reasoning into semantic tools.
3.7 Task 7: Enhance Wiki Alerting Mechanism with Rules-based Alerts
Most implementations of wiki software include a feature that alerts users when a page changes. This feature of the wiki software enables the users to designate particular pages of interest, and the software will email an alert if any of those pages change. However, this feature is limited to notifying when any component of the page changes.
During this phase of development, ACME will investigate and incorporate a rules-based alerting feature within the wiki. The rules criteria can restrict the alerts for a particular user on a particular page. For instance, a warfighter may only want alerts if a particular entity is within his zone of operations, or an analyst may want to receive alerts only when the target’s social network changes.
For the initial prototype, ACME will restrict the rules-based alerts to a controllable subset of alert types. The experimental prototype will allow a user to select an alerting level for a page from the following levels:
Alert when specific derived content (e.g., Social Network or Geographic representation) changes
Alert when RDF triples within the page change
Alert whenever page content changes
3.8 Task 8: Incorporate Alerting Software
During this phase of development, ACME will design and implement software to send the page-change alerts to the warfighter. In the field, the warfighter may have devices with limited capacity, such as a PDA or cell phone. Ideally, the alerts should reach the warfighter regardless of their device. Fortunately, ACME has already developed this technology on a previous SBIR.
ACME has developed the Information Dissemination Platform (IDP) during the execution of Phase I and Phase II of the Air Force SBIR AF071-064, titled “Managed Information Delivery to Multiple Devices. ” The IDP sends critical information alerts to cell phones, pagers, computers, and PDA’s, via text message, email, and instant message. This system can disseminate rich-media content including video and images to defined audiences and it will work on any device, making it ideal for sending information and alerts to military personnel in the field.
The IDP system has already demonstrated a significant flexibility for providing messages and alerts to various devices with different characteristics. In addition, ACME has authored two white papers on the IDP and presented its research findings at the SPIE Defense and Security Conference two years in a row. The Air Force Research Lab customer and other demo recipients have remarked that this technology is unlike any other, and is desperately needed for the warfighter.
During this phase of development, ACME will integrate the BLS to utilize the IDP for disseminating alerts to the warfighter. By utilizing the IDP system, ACME will minimize risk and time needed for this phase.
3.9 Task 9: Finalize the Feasibility Study
Mr. Runner will ACME evaluate several components during the feasibility analysis. This process will include the following assessments:
Expected system performance
Cost
Development Risk
User Acceptability
Extensibility
Scalability
Team ACME is confident that we can deliver a working prototype of the BLS, demonstrate its capabilities, and earn user acceptance in preparation for further work in Phase II.
Mr. Runner will investigate the feasibility of each design for entity and semantic extraction by evaluating their performance characteristics and accuracy. Because automated semantic reasoning attempts to model human cognition, the feasibility assessments of the entity and semantic extraction components must compare the accuracy of the extraction techniques to natural human cognition. This will determine the accuracy for each semantic concept within the ontology. Mr. Runner will also evaluate the performance characteristics when processing data, and assess how to maintain an acceptable performance when processing larger data sets or when using ontologies that are more extensive.
Mr. Runner will assess the feasibility of the automated process for updating the wiki pages by examining the resulting pages for readability. Mr. Runner will evaluate the machine-generated content to assess whether human analysts could understand the derived content. In addition, he will determine the accuracy of the updated links by creating a series of queries against the embedded semantic data. Finally, the assessment of the enriched content – social networks, maps, timelines, or hierarchies – will validate the results against the known ground truth.
To assess the feasibility of the alerting rules, Mr. Runner will create several sample profiles for users with different alerting characteristics. The evaluation process will evaluate the accuracy of the alerts by comparing the system-generated alerts against the expected alerts.
Mr. Runner will determine the development risk by evaluating the maturity of the various technologies as well as the complexity of the system development. Mr. Runner will identify and evaluate the new software designs and components that ACME incorporates into the BLS, and assess their maintainability.
Mr. Runner will evaluate user acceptability by judging how well the prototype supports the needs of the user, and how well the system addresses all user requirements. We also will evaluate how easy the users learn the system. We will integrate or address any comments we receive during user evaluation.
Mr. Runner will evaluate the extensibility based upon how well the design addresses Phase II requirements. We will also evaluate the difficulty of adapting or generalizing the system to other ontologies, bodies of data, and end users. This evaluation will identify the feasibility of the system design for eventual commercialization.
Each of the metrics will provide input into the overall feasibility score. The system design configuration that is determined to be the most feasible will be further developed, and form the basis of the phase II proposal. In case of equivalent scores, Mr. Runner will both solicit the COTR's input and consider our team's strongest capabilities in choosing the design configuration to recommend for Phase II development.
3.10 Task 10: Specify the Phase II Development Plan
During the exploration and assessment of the BLS, ACME also will identify future areas for enhancements during Phase II. ACME will solicit user feedback for recommended enhancements or additional requirements. In addition, ACME will explore and evaluate one or more of the potential capabilities listed below:
Improving the accuracy of the extraction processes
Enhancing the transformation of syntactic content into semantic content based upon new research
Enhancing the ontological reasoning process, enabling the system to derive additional information from incoming data
Applying filters to incoming data to reduce the volume of traffic the system must parse.
Identify other scalability challenges, and prepare mitigation strategies.
Evaluate how to apply micro formats.
3.11 Reporting
In addition to day-to-day informal contacts with the program monitor, ACME will submit interim technical progress reports in accordance with the negotiated delivery schedule.
Mr. Runner will prepare the comprehensive final that will document the work performed and the results obtained. This report will include an estimate of the technical feasibility for completing Phase II. The report will present conclusions and recommendations for the BLS. Each section of the report will correspond to the tasks listed in the work plan. ACME will draft each section as each task is completed. At the end of the project, ACME will combine the sections with our conclusions and recommendations to form the final report. Mr. Runner will submit a draft of the final report to the COTR three weeks before the end of Phase I, and deliver the final version of the report at the six-month mark.
4 Relationship with Future Research or Research and Development
4.1 Anticipated Results
Phase I will develop the detailed requirements for the BLS based upon user experience and the results of validation testing. ACME will use the system requirements to develop candidate design configurations to analyze for feasibility. ACME will further define the most feasible and desirable design to facilitate a Phase II System Development Plan identifying resource levels and risk.
In addition, Phase I results in valuable products independent of Phase II. These include the design of the update process for the semantic wiki pages, the design of the semantic extraction process, and the customized page alerts. Phase I products also will include a commercialization plan comprised of customer and competitive intelligence analysis, a draft Phase III marketing plan, and a section that identifies all intellectual property that may be exploited. ACME’s commercialization consultant, Speedy Gonzalez, of Intellectual Property Support Services, LLC, anticipates filing a patent application for the envisioned system at the conclusion of Phase I.
4.2 Significance of a Phase I as a Foundation for Phase II
Phase I defines the scope and plan for Phase II, and positions ACME to augment the capabilities of the BLS prototype rapidly and efficiently. During Phase II, developers will augment the various software components within the BLS. Developers will extend the capabilities of the system using different ontologies. In addition, ACME developers will research and incorporate new technologies from the semantic fields.
Microformats and other semantic technologies may offer new unforeseen capabilities for the system. ACME will also identify potential partners for development and commercialization based upon our findings in Phase I.
Similarly, the Phase III will continue to extend these capabilities. The Phase III, funded by other sources, will consist of any modifications required for a commercial product, and the marketing of the system.
5 Commercialization Strategy
Congress through the National Defense Authorization Act challenged the Armed Services to speed up the introduction of SBIR generated technologies to the Warfighter. Moreover, the Commercialization Pilot Program (CPP), established under the Act, was designed to accelerate the transition of SBIR-funded technologies to Phase III- especially into systems being developed, acquired and maintained for the warfighter.
The 2008 DoD Mentor-Protégé Program (MPP) brief identified technology focus areas for transfer in support of the warfighter. The combined MPP, SBIR, and CPP programs present an opportunity to meet the Pilot MPP’s stated technology transfer objectives through introduction of innovative SBIR technologies.
Our proposed commercialization strategy will meet the intent of Congress and the DoD by rapidly introducing the BLS into the Afghanistan Theater. Specifically, Mr. Gonzalez has identified the Logistics Civil Augmentation Program (LOGCAP IV) contract as a vehicle for introducing the BLS technology into the Afghanistan Theater. Mr. Gonzalez is currently tasked with identifying force protection innovations and transferring them into the hands of the warfighter.
In addition, ACME is an Mega Corporation (MeC) protégé firm under the DoD/National Geospatial- Intelligence Agency (NGA) Mentor Protégé Program. Working with MeC, we will explore opportunities to introduce BLS to other services through the Starfleet Supply contract and other programs.
Finally, ACME established a strategic relationship with Mr. Bugs Bunny, Co Head of Defense and Government Services Group, General Capital Markets. The scope of our relationship includes funding for SBIR Phase III commercialization through General Capital's Small Business Administration sponsored, Small Business Investment Corporation (SBIC) fund. Mr. Bugs Bunny can be reached at (703) 555-1234. Our relationship with Mr. Gonzalez, EDS, and General Capital provides the expertise, relationships, contract vehicles, and funding needed to commercialize the BLS successfully.
Figure 5: Schedule of Expected Commercialization Results from the BattleSpace Luminary System SBIR
Project.
5.1 Potential Commercial Applications
Diverse application areas would benefit from software that extracts semantic information from various data sources and updates a semantic wiki. In addition to the obvious applications in intelligence or military operations, ACME has identified several potential areas for commercialization.
On LOGCAP, personnel who support the warfighter could use the BLS to solve supply chain challenges and provide timely information. Supply technicians could use the BLS to determine the availability, location, and timing of the delivery of war consumables, Commanders can monitor data generated from Automated Identification Technology (AIT), Radio Frequency Identification (RFID), and GPS devices through a wiki page and determine the optimal time for replenishment or redirect of supplies.
According to the 2008 National Drug Intelligence Center (NDIC) National Drug Threat Survey (NDTS) data, approximately 1 million gang members belonging to more than 20,000 gangs were criminally active within all 50 states and the District of Columbia, as of September 2008. The report indicates that 58 percent of state and local law enforcement agencies reported that criminal gangs were active in their jurisdictions in 2008 compared with 45 percent of state and local agencies in 2004.
Agencies reporting include the state and local police departments, Bureau of Alcohol, Tobacco, Firearms, and Explosives (ATF), Bureau of Prisons (BOP), Department of Justice (DOJ), Department of Homeland Security (DHS), Customs and Border Protection (CBP), Drug Enforcement Administration (DEA), Federal Bureau of Investigation (FBI), Immigration and Customs Enforcement (ICE), NDIC, and United States Marshals Service (USMS).
Many gangs actively use Internet-based methods such as social networking sites, encrypted e-mail, Internet telephony, and instant messaging to communicate with one another and with drug customers in the United States and foreign countries. Gang members use social networking Internet sites such as MySpace, YouTube, and Facebook as well as personal web pages to communicate and boast about their gang membership and related activities. The BLS will be marketed to the agencies above for the purpose of maintaining semantic wiki pages on gang-related activity, and sending alerts to agents in the field when information about their targets changes.
The BLS could easily extend to the medical community. Both military field medics and civil emergency medical technicians (EMTs) need critical medical information about their patients to diagnose and treat them properly. The medic rarely has any familiarity with their patient’s medical history, current medications or other factors. Since the patient may be unable to provide such information, EMTs frequently must make critical decisions without this information.
Doctors likewise may be unaware of certain aspects of their patient’s history. If a patient visits multiple specialists for different conditions, those specialists usually prescribe treatments without considering the patient’s complete health picture. The BLS could create a wiki page for each patient, and each specialist or condition would update the system. Both medics and doctors would benefit from a central repository of easily accessed medical information. In addition, since the wiki includes semantic information, the doctor could search for other patients with similar conditions, or research how one prescription may interact with another,
Similarly, stockbrokers and other financial analysts need timely information to determine the optimal time for stock transactions or other financial activity. Sometimes, stock price fluctuations arise from unexpected or unrelated causes. By generating a financial stock ontology, investors and brokers could monitor news about their portfolio or search for trends semantically through the wiki page.
Internet search technology currently drives a significant portion of the research into the semantic web. Large companies such as Google and Yahoo actively seek innovations that they could apply to their search engines. Semantic technologies will have a profound impact on web-based advertising and marketing, so even more commercial sites such as Amazon and Ebay may have an interest in this technology.
Similarly, numerous social websites provide blogs for users to discuss a variety of topics. While less structured than wiki pages, blogs share many of the same characteristics. Social networking websites like Facebook, LinkedIn, Plaxo Plus, and MySpace are increasingly popular, particularly both among teenagers and adults. By incorporating semantic extraction and alerting, social networking sites could send messages to its participants when a related topic arose in a different forum or blog.
5.2 Existing Customers
ACME is currently engaged on several prime contracts as well as subcontracts for the DoD and intelligence agencies in which customers have directly expressed challenges with intelligently analyzing human social networks from available intelligence data in operational systems. Based upon the technology developed in Phase I, ACME will be presenting our capabilities to existing customers and partner integrators to elicit feedback and identify further security, data, and integration requirements with existing systems. Additionally, ACME will work with interested customers to acquire letters of intent for the justification of additional SBIR funding in Phase II.
Existing programs such as [] could incorporate the technology developed for the BLS. Already, these programs have implemented simple wikis for collaborative purposes. Similarly, ACME could immediately apply the BLS to another ACME customer, ODNI’s A-Space program.
6 Key Personnel
7 Facilities/Equipment
ACME is a certified ISO9001 Software Engineering firm that has established a state-of-the-art Research and Development (R&D) computer lab at the company headquarters in Vienna, Maryland. Our lab contains several types of Sun hardware including SunFire V490’s, and an X4200 and T2000. In addition, ACME has over two terabytes of on-line storage as well as several Dell servers. This hardware allows us to configure our lab to emulate multiple operational environments and run various operating systems as needed. All systems and software are completely documented and version controlled through a central configuration management system.
ACME has also established a strong partnership with Sun Microsystems which gives ACME access to Sun’s pool of loaner hardware at no cost should additional hardware be needed. Furthermore, ACME has access to Sun’s large benchmark centers to run large-scale benchmarks using hundreds of processors or terabytes of data. Using these systems, ACME will simulate the high volume of data within the GIG and explore any scalability concerns. ACME plans to use our internal computer lab to support all research and development activities for Phase I.
In addition, ACME has over 9,000 square feet of NSA certified SCIF space with secure communications. Over 90% of ACME employees maintain a TS/SCI security clearance. Should the effort expand to encompass classified materials, ACME can use this facility for the secure transfer of data, classified meetings, or system development, support and testing.
ACME facilities meet all environmental laws and regulations of federal, state (Maryland) and local governments for, but not limited to, airborne emissions, waterborne effluents, external radiation levels, outdoor noise, solid and bulk waste disposal practices, and handling and storage of toxic and hazardous materials.
8 Subcontractors/Consultants
Mr. Gonzalez will serve as a consultant during this Phase I SBIR. Mr. Gonzalez will provide guidance on developing the commercialization strategy. He also will provide advice on successfully developing systems for SBIR efforts. He will perform a maximum of 8% of the work for this SBIR.
During Phase II, we anticipate bringing additional consultants to the team. ACME has partnerships with various commercial companies and relationships with several local universities. In addition, the Primary Investigator is actively involved with the Semantic Web community, and has relationships with other specialists in the field of semantic reasoning.
9 Prior, Current or Pending Support of Similar Proposals or Awards
ACME has no prior, current or pending support for a similar proposal.
Companies involved in semantic research span the range from small startups to behemoths such as Google and Microsoft. In the United States and European Union, many agencies provide funding for companies to research the semantic web.
However, while many companies have developed various techniques for knowledge representation, few have developed any significant method to visualize this information so users could easily comprehend it. At the 2008 Semantic Technology Conference, few companies had implemented any form of interesting visualization of their semantic content. While projects might offer knowledge representation for academicians and researchers, most research fails to design a compelling user experience complete with semantically-rich visualizations.
10 References
[1] M. Buffa et al, “SweetWiki: A semantic wiki, ” Web Semantics: Science, Services and Agents on the World Wide Web archive, Volume 6 , Issue 1 (February 2008)