mlclient.jobs package
The ML Jobs package.
This package contains Python API to perform various operations. It contains the following modules
- documents_jobs
The ML Documents Jobs module.
- This package exports the following classes:
- WriteDocumentsJob
A multi-thread job writing documents into a MarkLogic database.
- DocumentsLoader
A class parsing files into Documents.
Examples
>>> from mlclient.jobs import WriteDocumentsJob
- class mlclient.jobs.DocumentsLoader[source]
Bases:
objectA class parsing files into Documents.
- classmethod load(path: str, uri_prefix: str = '', raw: bool = True) Generator[Document][source]
Load documents from files under a path.
When the path points to a file - yields a single Document with URI set to the file name. Otherwise, yields documents with URIs without the input path at the beginning. Both option can be customized with the uri_prefix parameter. When the raw flag is true, all documents are parsed to RawDocument with bytes content and metadata. Metadata is identified for a file at the same level with .metadata.json or .metadata.xml suffix.
- Parameters:
path (str) – A path to a directory or a single file.
uri_prefix (str, default "") – URIs prefix to apply
raw (bool, default True) – A flag indicating whether files should be parsed to a RawDocument
- Returns:
A generator of Document instances
- Return type:
Generator[Document]
- classmethod load_document(path: str, uri: str | None = None, raw: bool = True) Document[source]
Load a document from a file.
By default, returns a Document without URI. It can be customized with the uri parameter. When the raw flag is true, the document is parsed to RawDocument with bytes content and metadata. Metadata is identified for a file at the same level with .metadata.json or .metadata.xml suffix.
- Parameters:
path (str) – A file path
uri (str | None, default None) – URI to set for a document.
raw (bool, default True) – A flag indicating whether file should be parsed to a RawDocument
- Returns:
A Document instance
- Return type:
Document
- class mlclient.jobs.WriteDocumentsJob(thread_count: int | None = None, batch_size: int = 50)[source]
Bases:
objectA multi-thread job writing documents into a MarkLogic database.
- property batch_size: int
A number of documents in a single batch.
- property completed: list[str]
A list of processed documents.
- property completed_count: int
A number of processed documents.
- property failed: list[str]
A list of processed documents that failed to be written.
- property successful: list[str]
A list of successfully processed documents.
- property thread_count: int
A number of threads.
- with_client_config(**config)[source]
Set DocumentsClient configuration.
- Parameters:
config – Keyword arguments to be passed for a DocumentsClient instance.
- with_database(database: str)[source]
Set a database name.
- Parameters:
database (str) – A database name