Textual Analysis and Data Extraction Toolset
Overview
The Abacus Textual Analysis and Data Extraction Toolset (TADET) is a collection of software tools that allows users to quickly and intelligently extract relevant data from text sources such as Internet web pages, text documents, text data files, etc. It also provides the user capability to measure the relevancy of extracted data and to integrate the data accordingly prior to displaying it in custom formatted reports. In addition to a flexible user-interface, TADET consists of four major components: TADET Tool Interaction
TADET System Features Summary
* Utilizes a specialized high-level scripting language for text scanner specification Overview of TADET Components
Data Analysis Scanning and Extraction Language (DASEL)
The Data Analysis Scanning and Extraction Language (DASEL) is a ruled-based scripting language for creating text scanners that can extract relevant data from text files such as HTML retrieved from an Internet site. Scanner rules are written in a text file and submitted to DASEL along with the downloaded HTML to be scanned. In addition to the scanner results reports, DASEL output includes three reports for debugging the rule set: the Parse Tree Report, the Audit Trail Report, and the Error Log Report. These reports are valuable tools for creating effective rule sets. The DASEL system also contains configuration management facilities for URLs, source text files, scanner rule files, output results files, etc.
Parallel HTML Downloader (PHD)
The Parallel HTML Downloader (PHD) is a valuable software tool that is capable of sending out multiple URLs into the Internet in parallel and simultaneously capturing the corresponding web page HTML source for later analysis. The Downloader is integrated into the TADET system and operates in the background without the necessity for user maintenance.
TADET Relevancy Analyzer (TRA)
The TADET Relevancy Analyzer performs post-processing on data items extracted from source text files. Some of the main features are:
* Removal of duplicate results
The user supplies a query or series of matching words for the Analyzer to insert into the metric algorithms. Relevancy metrics are based on text analysis features such as number of word matches, word order, word position, etc.
TADET Report Generator (TRG)
The TADET Report Generator (TRG) has features for the report formatting of extracted data items and text by the scanners such as:
* Data item row and column positioning Home | Corporate Profile | Abacus Corporate Presentation | Abacus AI Projects Presentation | Software Development | Systems Engineering & Analysis | Artificial Intelligence | Avionics Systems | Ground Systems | Computer Systems | Business Systems | Proprietary Products | Customer Support Services | New Activities | Key Management | Clients | Employment Opportunities | Site Map | Contact Us | About Us
|