Text Mining

Introduction

Text Mining in Funnelback involves extracting entities and definitions from textual data. Named entities include person names, organisations, products, geographic locations etc, as well as acronyms. A query for one of these entities will result in the definition being displayed on the search results page, with a link back to the source document.

Configuration

User Interface

Text-miner-john-doe.png

In the screenshot above we can see a named entity (a person's name) and an associated definition. The entity is a hyperlink which will go to the document from which the definition was taken. The FreeMarker tags which cause this to be displayed are:

    <#if response.entityDefinition?exists>
        <div class="textminer"><@fb.TextMiner></@fb.TextMiner></div>         
    </#if>

This syntax states that if the response contains an EntityDefinition object then display it in a div with the class "textminer" and use the "TextMiner" tags to output the entity, the link and the definition.

Logging

Text Mining log messages will be written out to the file crawler.inline_filter.log. Messages showing which entities are being stored will contain the label Entity: e.g.

Entity: [Rss] JSON: {"nounPhrase":"Rss","sourceURL":"http://sample.com/","definition":"is a format for delivering regularly changing content via the web..."}
Entity: [Ors] JSON: {"nounPhrase":"Ors","sourceURL":"http://sample.com/","definition":"Order Routing System..."}

Here the entities "Rss" and "Ors" are being inserted into the database. Searching for these entities will cause their definitions to be displayed.

See Also