高级模型插件#

模型插件教程介绍了开发添加新模型支持的插件的基础知识。本文档涵盖了更高级的主题。

为你的模型插件考虑的功能包括

接受 API 密钥，使用结合了 llm keys set 、环境变量以及支持向模型传递显式密钥的标准机制。
包括支持异步模型，可与 Python 的 asyncio 库一起使用。
支持结构化输出，使用 JSON 架构。
处理附件（图像、音频等）用于多模态模型。
跟踪 token 使用量，适用于按 token 收费的模型。

提示：延迟加载高开销依赖#

如果你的插件依赖于高开销的库，例如 PyTorch，你应该避免在模块的顶层导入该依赖（或使用该依赖的依赖）。插件中的高开销导入意味着即使是简单的命令，如 llm --help，也可能需要很长时间才能运行。

相反，将这些导入移动到需要它们的函数内部。这里有一个示例对 llm-sentence-transformers 的更改，它将运行 llm --help 的时间缩短了 1.8 秒！

接受 API 密钥的模型#

调用 OpenAI、Anthropic 或 Google Gemini 等 API 提供商的模型通常需要 API 密钥。

LLM 的 API 密钥管理机制在此处描述。

如果你的插件需要 API 密钥，你应该继承 llm.KeyModel 类，而不是 llm.Model 类。像这样开始你的模型定义：

import llm

class HostedModel(llm.KeyModel):
    needs_key = "hosted" # Required
    key_env_var = "HOSTED_API_KEY" # Optional

这告诉 LLM 你的模型需要一个 API 密钥，该密钥可以保存在密钥注册表中，密钥名称为 hosted，或者也可以作为 HOSTED_API_KEY 环境变量提供。

然后当你定义你的 execute() 方法时，它应该接受一个额外的 key= 参数，如下所示：

    def execute(self, prompt, stream, response, conversation, key=None):
        # key= here will be the API key to use

LLM 将从环境变量、密钥注册表或作为 --key 命令行选项或 model.prompt(..., key=) 参数传递给 LLM 的密钥传递进来。

异步模型#

插件可以选择提供其模型的异步版本，适用于与 Python 的 asyncio 一起使用。这对于可通过 HTTP API 访问的远程模型特别有用。

模型的异步版本继承自 llm.AsyncModel，而不是 llm.Model。它必须实现一个 async def execute() 异步生成器方法，而不是 def execute()。

此示例显示了 OpenAI 默认插件的一个子集，说明此方法可能如何工作

from typing import AsyncGenerator
import llm

class MyAsyncModel(llm.AsyncModel):
    # This can duplicate the model_id of the sync model:
    model_id = "my-model-id"

    async def execute(
        self, prompt, stream, response, conversation=None
    ) -> AsyncGenerator[str, None]:
        if stream:
            completion = await client.chat.completions.create(
                model=self.model_id,
                messages=messages,
                stream=True,
            )
            async for chunk in completion:
                yield chunk.choices[0].delta.content
        else:
            completion = await client.chat.completions.create(
                model=self.model_name or self.model_id,
                messages=messages,
                stream=False,
            )
            yield completion.choices[0].message.content

如果你的模型接受 API 密钥，你应该继承 llm.AsyncKeyModel，并在你的 .execute() 方法上有一个 key= 参数

class MyAsyncModel(llm.AsyncKeyModel):
    ...
    async def execute(
        self, prompt, stream, response, conversation=None, key=None
    ) -> AsyncGenerator[str, None]:

然后应将此异步模型实例传递给位于 register_models() 插件钩子中的 register() 方法。

@hookimpl
def register_models(register):
    register(
        MyModel(), MyAsyncModel(), aliases=("my-model-aliases",)
    )

支持架构#

如果你的模型支持结构化输出并符合定义的 JSON 架构，你可以通过首先向类中添加 supports_schema = True 来实现支持

class MyModel(llm.KeyModel):
    ...
    support_schema = True

然后向你的 .execute() 方法中添加代码，检查是否存在 prompt.schema，如果存在，则使用它来提示模型。

prompt.schema 始终是一个表示 JSON 架构的 Python 字典，即使用户传入的是 Pydantic 模型类。

查看 llm-gemini 和 llm-anthropic 插件，查看此模式的实际应用示例。

多模态模型的附件#

像 GPT-4o、Claude 3.5 Sonnet 和 Google 的 Gemini 1.5 这样的模型是多模态的：它们接受图像形式的输入，甚至可能接受音频、视频和其他格式的输入。

LLM 将这些称为附件。模型可以指定它们接受的附件类型，然后在 .execute() 方法中实现特殊代码来处理它们。

请参阅 Python 附件文档，了解在 Python API 中使用附件的详细信息。

指定附件类型#

一个 Model 子类可以通过定义一个 attachment_types 类属性来列出它接受的附件类型

class NewModel(llm.Model):
    model_id = "new-model"
    attachment_types = {
        "image/png",
        "image/jpeg",
        "image/webp",
        "image/gif",
    }

当使用 llm -a filename 将附件传递给 LLM 时会检测到这些内容类型，或者用户可以使用 --attachment-type filename image/png 选项指定。

注意： MP3 文件的附件类型将被检测为 audio/mpeg，而不是 audio/mp3。

LLM 将使用 attachment_types 属性来验证提供的附件是否应被接受，然后才将它们传递给模型。

处理附件#

传递给 execute() 方法的 prompt 对象将有一个 attachments 属性，其中包含用户提供的 Attachment 对象列表。

一个 Attachment 实例具有以下属性

url (str)：附件的 URL，如果它是作为 URL 提供的
path (str)：附件的解析文件路径，如果它是作为文件提供的
type (str)：附件的内容类型，如果提供了
content (bytes)：附件的二进制内容，如果提供了

通常只有 url 、 path 或 content 中的一个会被设置。

你通常应该通过以下方法之一访问类型和内容

attachment.resolve_type() -> str：返回 type，如果可用；否则尝试通过查看内容的前几个字节来猜测类型
attachment.content_bytes() -> bytes：返回二进制内容，可能需要从文件读取或从 URL 获取
attachment.base64_content() -> str：将该内容作为 base64 编码的字符串返回

一个 id() 方法返回此内容的数据库 ID，它是二进制内容的 SHA256 哈希值，或者，对于托管在外部 URL 的附件，是 {"url": url} 的哈希值。这是一个实现细节，你通常不需要直接访问它。

注意，带有附件的提示可能完全不包含文本提示，在这种情况下 prompt.prompt 将是 None。

以下是 OpenAI 插件如何处理附件的示例，包括未提供 prompt.prompt 的情况

if not prompt.attachments:
    messages.append({"role": "user", "content": prompt.prompt})
else:
    attachment_message = []
    if prompt.prompt:
        attachment_message.append({"type": "text", "text": prompt.prompt})
    for attachment in prompt.attachments:
        attachment_message.append(_attachment(attachment))
    messages.append({"role": "user", "content": attachment_message})


# And the code for creating the attachment message
def _attachment(attachment):
    url = attachment.url
    base64_content = ""
    if not url or attachment.resolve_type().startswith("audio/"):
        base64_content = attachment.base64_content()
        url = f"data:{attachment.resolve_type()};base64,{base64_content}"
    if attachment.resolve_type().startswith("image/"):
        return {"type": "image_url", "image_url": {"url": url}}
    else:
        format_ = "wav" if attachment.resolve_type() == "audio/wav" else "mp3"
        return {
            "type": "input_audio",
            "input_audio": {
                "data": base64_content,
                "format": format_,
            },
        }

如你所见，如果可用，它使用 attachment.url，否则回退到使用 base64_content() 方法将图像直接嵌入发送到 API 的 JSON 中。对于 OpenAI API，音频附件始终以 base64 编码的字符串形式包含。

来自先前对话的附件#

实现了继续对话功能的模型可以使用 response.attachments 属性重建先前的消息 JSON。

以下是 OpenAI 插件如何实现这一点的示例

for prev_response in conversation.responses:
    if prev_response.attachments:
        attachment_message = []
        if prev_response.prompt.prompt:
            attachment_message.append(
                {"type": "text", "text": prev_response.prompt.prompt}
            )
        for attachment in prev_response.attachments:
            attachment_message.append(_attachment(attachment))
        messages.append({"role": "user", "content": attachment_message})
    else:
        messages.append(
            {"role": "user", "content": prev_response.prompt.prompt}
        )
    messages.append({"role": "assistant", "content": prev_response.text_or_raise()})

那里使用的 response.text_or_raise() 方法将返回响应中的文本，或者如果响应是尚未完全解析的 AsyncResponse 实例，则会引发 ValueError 异常。

这是一个稍微有点奇怪的技巧，用于解决同步和异步模型之间共享构建 messages 列表的逻辑的常见需求。

跟踪 token 使用量#

按 token 收费的模型应该跟踪每个提示使用的 token 数量。response.set_usage() 方法可以用来记录响应使用的 token 数量 - 这些信息将通过 Python API 提供，并记录到 SQLite 数据库中供命令行用户查看。

response 在此处是作为参数传递给 .execute() 的响应对象。

在你的 .execute() 方法结束时调用 response.set_usage() 。它接受关键字参数 input= 、 output= 和 details= - 这三个都是可选的。input 和 output 应该是整数，而 details 应该是一个字典，提供输入和输出 token 数量之外的额外信息。

此示例记录了 15 个输入 token、340 个输出 token，并指出有 37 个 token 被缓存

response.set_usage(input=15, output=340, details={"cached": 37})