Imagine if computers could read and interpret documents. Humans could focus their efforts on understanding what analysis results mean to make better decisions. Our interest, at Dynamic Risk, is to improve the safety and reliability of energy pipeline networks by taking full advantage of the vast amounts of data locked in cumbersome formats, handwritten documents, drawings, photographs, and in paper archives.
We want to fundamentally change how we ask questions and receive answers. Today, we ask questions based on the data we have available in structured databases. In the future we want to ask questions and not worry about having the data readily available in a structured format.
To move forward towards this grand vision, we are sponsoring a challenge to solve the first part of this puzzle. We want a solution that can not only learn how and what data to extract from data sources like spreadsheets, word processor files, and computer generated PDF files but also learn how to map the data found in these documents to specified target fields in a database.
The user will be presented a document to be "read". They will manually identify the locations in the document that contains the data required and teach the system how it maps to the target fields in the database including any manipulations to the data (ie: unit conversions) that are necessary. The software will need to learn, using a series of documents which will vary in the formatting of the same target content. For example, the learning process will need to process reports containing the same target information but enclosed in reports by different service providers, therefore the formats can be quite different.
Ideally, the solution must be able to empirically rate its level of confidence in its results as well as identify what it cannot process.
The Challenge Breakthrough
If computers could read and analyse vast amounts of information, we can focus our efforts on understanding what the results means and make better decisions. The Cognitive Computing Challenge will break the barriers that prevent us from accessing the majority of information in this world locked in written documents which require humans to interpret and extract the useful information. Our interest is to change how energy pipeline networks are managed, improve their safety, and ultimately save lives. However, this technology has broad application to multiple industries.
These types of tasks are currently done manually by humans. A solution to this challenge will automate the repetitive tasks required to do complex analysis which will eliminate most of the time required and minimize errors. Ideally, there should be as little human intervention as possible.
The winning solution for this challenge will be the one that is not only the most accurate, but also the most flexible, easy to use, and can be trained with minimal documents to extract any type of data that is required.
The Cognitive Computing Challenge has four stages:
1. Qualifying Challenge
This is a qualifying problem which requires you to process MLS (Multiple Listing Service) data from 300 MLS training records into a database. You will be provided with a document summarizing the correct format of each field to be extracted.
The files required for this Qualifying Challenge can be downloaded here:
The document format is identical for all of the training documents. You must train your system to process and load the data into a database with a structure of your choice. Teams must submit their solution for this Qualifying Challenge before the materials for the Cognitive Computing Challenge are released.
The submitted system will be tested by Dynamic Risk by processing a separate set of documents in the same format. The resulting data produced will be scored and each teams total score and the components of their score will be posted on a leaderboard visible to the public. Only scores will be published. The teams methodology and questionnaire responses will not be posted. Submissions for all teams will be tested with the same documents to derive their scores.
Feedback will be provided to all teams regarding the scoring of their submission with suggestions on where to focus their efforts to improve their scores. Teams whose approaches, in our opinion, are not suitable for the Cognitive Computing Challenge will be encouraged to resubmit an entry for the Qualifying Challenge with a different approach.
2. Cognitive Computing Challenge - Similar to the Qualifying Challenge, a set of training documents and a full description of the target attributes will be provided. The difference with this challenge is the following:
- A smaller number of training documents (approx. 100)
- A larger variance for the responses in each target field. This will require greater emphasis on cleaning the data extracted
- Variances in the document format and structure. It will not be one consistent format as per the Preliminary Challenge
Materials required to complete this stage of the challenge will be distributed directly to each competitor once they satisfactorily complete the Qualifying Challenge.
3. Submissions for the Cognitive Computing Challenge
4. Judging and announcement of the winner
Challenge Criteria and Challenge Prize award
One prize will be awarded at the end of the Cognitive Computing Challenge. The team or individual with a submission that best meets the judging criteria will be the sole winner.
Who Can Participate?
The challenge is open to individuals, teams and organizations globally. To be eligible to compete, innovators must comply with all the terms of the challenge as defined in the Challenge Specific Agreement.