Interface NoContentDocument

  • All Known Subinterfaces:
    BytesDocument, FilterableDocument, StringDocument

    public interface NoContentDocument
    A document which may have its non-content parts inspected.

    This document does not provide access to the document content and does not provide methods for cloning the document. This only provides a way to inspect non-content parts of document.

    • Method Detail

      • getURI

        URI getURI()
        The URI of the document.
        Returns:
        the URI of the document.
      • getMetadata

        com.google.common.collect.ImmutableListMultimap<String,​String> getMetadata()
        Metadata of the document.

        Metadata is accessible to the indexer with metamap configuration. Values may contain printable UTF-8 characters that do not contain new lines (line feeds) or carriage return. Metadata keys must only consist of printable non whitespace ASCII characters.

        The Content-Type metadata, if set, should be the content type of the original document. It must have no more than one value. Unlike getMimeType() this value will not change during filtering unless a filter is correcting the original content type. Metadata with keys that start with

        X-Fun
        should not be edited or removed or added.

        For web crawls this will initially contain the HTTP headers returned from the server, thus keys which might conflict with existing standard HTTP headers should be avoided.

        Returns:
        a immutable map of the documents metadata.
      • getCopyOfMetadata

        com.google.common.collect.ListMultimap<String,​String> getCopyOfMetadata()
        Gets a mutable copy of the documents metadata.
        Returns:
        a mutable copy of the documents metadata.
      • getCharset

        Optional<Charset> getCharset()
        Returns the possible charset of the document.

        If a filter edits the charset of the bytes, the filter must also ensure the returned document produces the new charset from this method

        Returns:
        the charset of the current content of the document or empty if it is unknown.
      • getDocumentType

        DocumentType getDocumentType()
        Returns the document type of this filterable document, this may not be the original document type.

        If a filter changes the type of a document e.g. CSV to JSON the document type of the Filterable Document should be changed rather than the Content-Type in the headers.

        If the implementation describes a charset it should be ignored with preference to the charset returned by getCharset().

        Returns:
        the document type for current document content (not the original document type which may have changed with filtering)