Skip to content

Support OpenDocument text files in document extraction #202

Description

@Zenine

Summary

Graphon currently recognizes common Office and text formats, but OpenDocument text files (.odt, application/vnd.oasis.opendocument.text) are not routed as supported document inputs. This means integrations that rely on Graphon file type standardization or document extraction cannot handle ODT files consistently.

Some WPS Office document MIME types should also be classified as document files for file-type validation.

Expected behavior

  • .odt files are classified as document files.
  • application/vnd.oasis.opendocument.text is classified as a document MIME type.
  • Document extractor dispatch routes .odt and application/vnd.oasis.opendocument.text to an ODT extractor.
  • WPS Office document MIME types are classified as document files.

Notes

unstructured.partition.odt.partition_odt is available through the existing unstructured dependency, so ODT extraction can reuse the existing unstructured extractor path.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions