promptfoo · Sep 11, 2024 · Sep 11, 2024 · Sep 11, 2024 · Sep 12, 2024 · Sep 12, 2024
diff --git a/.vscode/settings.json b/.vscode/settings.json
@@ -3,6 +3,7 @@
   "editor.formatOnSave": true,
   "cSpell.words": [
     "apidevtools",
+    "chatcompletion",
     "cybercrime",
     "Dedup",
     "deduped",
@@ -21,7 +22,8 @@
     "redteam",
     "TEMPLATING",
     "TESTCASE",
-    "unindented"
+    "unindented",
+    "uuidv"
   ],
   "jest.jestCommandLine": "npm run test --"
 }
diff --git a/package-lock.json b/package-lock.json
diff --git a/package.json b/package.json
@@ -2,7 +2,7 @@
   "name": "promptfoo",
   "description": "LLM eval & testing toolkit",
   "author": "Ian Webster",
-  "version": "0.86.1",
+  "version": "0.87.0",
   "license": "MIT",
   "type": "commonjs",
   "repository": {

diff --git a/site/docs/integrations/google-sheets.md b/site/docs/integrations/google-sheets.md
@@ -48,6 +48,29 @@ export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-file.json"
 tests: https://docs.google.com/spreadsheets/d/1eqFnv1vzkPvS7zG-mYsqNDwOzvSaiIAsKB3zKg9H18c/edit?usp=sharing
 ```
 
+## Using Custom Providers for Model-Graded Metrics
+
+When using Google Sheets for test cases, you can still use custom providers for model-graded metrics like `llm-rubric` or `similar`. To do this, override the default LLM grader by adding a `defaultTest` property to your configuration:
+
+```yaml
+prompts:
+  - prompt1.txt
+  - prompt2.txt
+providers:
+  - anthropic:messages:claude-3-5-sonnet-20240620
+  - openai:chat:gpt-4o-mini
+tests: https://docs.google.com/spreadsheets/d/1eqFnv1vzkPvS7zG-mYsqNDwOzvSaiIAsKB3zKg9H18c/edit?usp=sharing
+defaultTest:
+  options:
+    provider:
+      text:
+        id: ollama:llama3.1:70b
+      embedding:
+        id: ollama:embeddings:mxbai-embed-large
+```
+
+For more information on overriding the LLM grader, see the [model-graded metrics documentation](/docs/configuration/expected-outputs/model-graded/#overriding-the-llm-grader).
+
 ## Writing outputs to a Google Sheet
 
 The `outputPath` parameter (`--output` or `-o` on the command line) supports Google Sheets URLs. In order to write, Default Application Credentials must be configured with a service account that has write access.

diff --git a/site/docs/intro.md b/site/docs/intro.md
@@ -17,6 +17,15 @@ With promptfoo, you can:
 
 The goal: **test-driven LLM development**, not trial-and-error.
 
+<hr/>
+
+**Get Started:**
+
+- [Red teaming](/docs/red-team/quickstart) - LLM security scans
+- [Evaluations](/docs/getting-started) - LLM quality benchmarks
+
+<hr/>
+
 promptfoo produces matrix views that let you quickly evaluate outputs across many prompts.
 
 Here's an example of a side-by-side comparison of multiple prompts and inputs:

diff --git a/site/docs/red-team/index.md b/site/docs/red-team/index.md
@@ -17,7 +17,7 @@ The purpose of red teaming is to identify and address these vulnerabilities befo
 In order to do this, we need to systematically generate a wide range of adversarial inputs and evaluate the LLM's responses.
 
 :::tip
-Ready to run a red team? Jump to **[Quickstart](#quickstart)**.
+Ready to run a red team? Jump to **[Quickstart](/docs/red-team/quickstart/)**.
 :::
 
 ## How generative AI red teaming works

diff --git a/site/docs/red-team/quickstart.md b/site/docs/red-team/quickstart.md
@@ -111,29 +111,7 @@ The `init` step will do this for you automatically, but in case you'd like to ma
 
 This will generate several hundred adversarial inputs across many categories of potential harm and save them in `redteam.yaml`.
 
-You can reduce the number of test cases by setting the specific [plugins](/docs/guides/llm-redteaming#step-3-generate-adversarial-test-cases) you want to run. For example, to only generate harmful inputs:
-
-<Tabs groupId="installation-method">
-  <TabItem value="npx" label="npx" default>
-    <CodeBlock language="bash">
-      npx promptfoo@latest redteam generate --plugins harmful
-    </CodeBlock>
-  </TabItem>
-  <TabItem value="npm" label="npm">
-    <CodeBlock language="bash">
-      promptfoo redteam generate --plugins harmful
-    </CodeBlock>
-  </TabItem>
-  <TabItem value="brew" label="brew">
-    <CodeBlock language="bash">
-      promptfoo redteam generate --plugins harmful
-    </CodeBlock>
-  </TabItem>
-</Tabs>
-
-Run `npx promptfoo@latest redteam generate --help` to see all available plugins.
-
-### Changing the provider
+## Changing the provider (optional)
 
 By default we use OpenAI's `gpt-4o` model to generate the adversarial inputs, but we support hundreds of other models. Learn more about [setting the provider](/docs/red-team/configuration/#providers).
 

diff --git a/site/docs/usage/troubleshooting.md b/site/docs/usage/troubleshooting.md
@@ -0,0 +1,38 @@
+---
+sidebar_position: 60
+---
+
+# Troubleshooting
+
+## Out of memory error
+
+To increase the amount of memory available to Promptfoo, increase the node heap size using the `--max-old-space-size` flag. For example:
+
+```bash
+# 8192 MB is 8 GB. Set this to an appropriate value for your machine.
+NODE_OPTIONS="--max-old-space-size=8192" promptfoo eval
+```
+
+## OpenAI API key is not set
+
+If you're using OpenAI, you set the `OPENAI_API_KEY` environment variable or add `apiKey` to the provider config.
+
+If you're not using OpenAI but still receiving this message, you probably have some [model-graded metric](/docs/configuration/expected-outputs/model-graded/) such as `llm-rubric` or `similar` that requires you to [override the grader](/docs/configuration/expected-outputs/model-graded/#overriding-the-llm-grader).
+
+Follow the instructions to override the grader, e.g. using the `defaultTest` property.
+
+In this example, we're overriding the text and embedding providers to use Azure OpenAI (gpt-4o for text, and ada-002 for embedding).
+
+```yaml
+defaultTest:
+  options:
+    provider:
+      text:
+        id: azureopenai:chat:gpt-4o-deployment
+        config:
+          apiHost: xxx.openai.azure.com
+      embedding:
+        id: azureopenai:embeddings:text-embedding-ada-002-deployment
+        config:
+          apiHost: xxx.openai.azure.com
+```