mlclient.jobs package

The ML Jobs package.

This package contains Python API to perform various operations. It contains the following modules

  • documents_jobs

    The ML Documents Jobs module.

This package exports the following classes:
  • WriteDocumentsJob

    A multi-thread job writing documents into a MarkLogic database.

  • DocumentsLoader

    A class parsing files into Documents.

Examples

>>> from mlclient.jobs import WriteDocumentsJob
class mlclient.jobs.DocumentsLoader[source]

Bases: object

A class parsing files into Documents.

classmethod load(path: str, uri_prefix: str = '', raw: bool = True) Generator[Document][source]

Load documents from files under a path.

When the path points to a file - yields a single Document with URI set to the file name. Otherwise, yields documents with URIs without the input path at the beginning. Both option can be customized with the uri_prefix parameter. When the raw flag is true, all documents are parsed to RawDocument with bytes content and metadata. Metadata is identified for a file at the same level with .metadata.json or .metadata.xml suffix.

Parameters:
  • path (str) – A path to a directory or a single file.

  • uri_prefix (str, default "") – URIs prefix to apply

  • raw (bool, default True) – A flag indicating whether files should be parsed to a RawDocument

Returns:

A generator of Document instances

Return type:

Generator[Document]

classmethod load_document(path: str, uri: str | None = None, raw: bool = True) Document[source]

Load a document from a file.

By default, returns a Document without URI. It can be customized with the uri parameter. When the raw flag is true, the document is parsed to RawDocument with bytes content and metadata. Metadata is identified for a file at the same level with .metadata.json or .metadata.xml suffix.

Parameters:
  • path (str) – A file path

  • uri (str | None, default None) – URI to set for a document.

  • raw (bool, default True) – A flag indicating whether file should be parsed to a RawDocument

Returns:

A Document instance

Return type:

Document

class mlclient.jobs.WriteDocumentsJob(thread_count: int | None = None, batch_size: int = 50)[source]

Bases: object

A multi-thread job writing documents into a MarkLogic database.

await_completion()[source]

Await a job’s completion.

property batch_size: int

A number of documents in a single batch.

property completed: list[str]

A list of processed documents.

property completed_count: int

A number of processed documents.

property failed: list[str]

A list of processed documents that failed to be written.

start()[source]

Start a job’s execution.

property successful: list[str]

A list of successfully processed documents.

property thread_count: int

A number of threads.

with_client_config(**config)[source]

Set DocumentsClient configuration.

Parameters:

config – Keyword arguments to be passed for a DocumentsClient instance.

with_database(database: str)[source]

Set a database name.

Parameters:

database (str) – A database name

with_documents_input(documents: Iterable[Document])[source]

Add Documents to the job’s input.

Parameters:

documents (Iterable[Document]) – Documents to be written into a MarkLogic database

with_filesystem_input(path: str, uri_prefix: str = '')[source]

Load files and add parsed Documents to the job’s input.

Parameters:
  • path (str) – An input path with file(s) to be written into a MarkLogic database

  • uri_prefix (str, default "") – An URI prefix to be put before files’ relative path