Skip to Main Content U.S. Department of Energy
IN-SPIRE™ Visual Document Analysis

FAQ: What types of documents can it process?

IN-SPIRE™ organizes and visualizes the topical content of ASCII or XML text files. These files may come from web pages, databases, OCR documents, message traffic, or other sources. They must be available to IN-SPIRE™ as ASCII text (i.e., plain text). IN-SPIRE™ currently cannot read documents in special formats such as MS Word or PDF. IN-SPIRE™ is capable of ingesting XML encoded documents, and can read HTML. HTML documents IN-SPIRE™ retrieves directly from the web are cleaned of markup, but local HTML files do not have their tags removed.

Return to the FAQ page

IN-SPIRE™