Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: promptfoo/promptfoo
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 0.86.1
Choose a base ref
...
head repository: promptfoo/promptfoo
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: 0.87.0
Choose a head ref
  • 17 commits
  • 41 files changed
  • 4 contributors

Commits on Sep 11, 2024

  1. docs: fix quickstart link

    typpo committed Sep 11, 2024
    Copy the full SHA
    818cb74 View commit details
  2. docs: minor quickstart update

    typpo committed Sep 11, 2024
    Copy the full SHA
    7fce80f View commit details
  3. site: intro and image updates (#1636)

    Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
    typpo and github-actions[bot] authored Sep 11, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    6247cb8 View commit details

Commits on Sep 12, 2024

  1. fix: handle when table has no data

    typpo committed Sep 12, 2024
    Copy the full SHA
    09401b5 View commit details
  2. docs: add troubleshooting page

    typpo committed Sep 12, 2024
    Copy the full SHA
    deb1b4c View commit details
  3. fix: run db migrations first thing in cli (#1638)

    typpo authored Sep 12, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    666b942 View commit details
  4. feat: remote strategy execution (#1592)

    Co-authored-by: Michael D'Angelo <michael.l.dangelo@gmail.com>
    typpo and mldangelo authored Sep 12, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    c865d7e View commit details
  5. chore: add --remote to eval (#1639)

    typpo authored Sep 12, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    2dab7a5 View commit details
  6. fix(redteam): adjust attacker model size in strategies

    mldangelo committed Sep 12, 2024
    Copy the full SHA
    ef39b21 View commit details
  7. site: unset dark theme image

    typpo committed Sep 12, 2024
    Copy the full SHA
    b4fc27d View commit details
  8. fix: adjust model size in iterativeImage too

    typpo committed Sep 12, 2024
    Copy the full SHA
    8495b29 View commit details
  9. docs: add section to google-sheets on using custom providers for mode…

    …l-graded metrics
    mldangelo committed Sep 12, 2024
    Copy the full SHA
    6470ec6 View commit details
  10. chore(deps-dev): bump @aws-sdk/client-bedrock-runtime from 3.649.0 to…

    … 3.650.0 (#1640)
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    dependabot[bot] authored Sep 12, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    af65d87 View commit details
  11. chore(deps): bump openai from 4.58.2 to 4.59.0 (#1641)

    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    dependabot[bot] authored Sep 12, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    4b1845b View commit details
  12. chore: clean up telemetry notice

    typpo committed Sep 12, 2024
    Copy the full SHA
    face5b5 View commit details
  13. chore: ability to record when feature is used (#1643)

    typpo authored Sep 12, 2024

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    451601d View commit details
  14. 0.87.0

    typpo committed Sep 12, 2024
    Copy the full SHA
    10e4920 View commit details
Showing with 549 additions and 178 deletions.
  1. +3 −1 .vscode/settings.json
  2. +38 −38 package-lock.json
  3. +1 −1 package.json
  4. +23 −0 site/docs/integrations/google-sheets.md
  5. +9 −0 site/docs/intro.md
  6. +1 −1 site/docs/red-team/index.md
  7. +1 −23 site/docs/red-team/quickstart.md
  8. +38 −0 site/docs/usage/troubleshooting.md
  9. +20 −11 site/src/pages/index.tsx
  10. +27 −1 site/src/pages/llm-vulnerability-scanner.module.css
  11. +23 −0 site/src/pages/llm-vulnerability-scanner.tsx
  12. BIN site/static/img/continuous-monitoring.png
  13. BIN site/static/img/continuous-monitoring@2x.png
  14. BIN site/static/img/riskreport-1.png
  15. BIN site/static/img/riskreport-1@2x.png
  16. +0 −1 src/commands/cache.ts
  17. +0 −2 src/commands/delete.ts
  18. +6 −2 src/commands/eval.ts
  19. +0 −1 src/commands/init.ts
  20. +0 −3 src/commands/list.ts
  21. +0 −11 src/commands/redteam/generate.ts
  22. +0 −1 src/commands/redteam/init.ts
  23. +0 −1 src/commands/share.ts
  24. +0 −4 src/commands/show.ts
  25. +0 −1 src/commands/view.ts
  26. +4 −0 src/config.ts
  27. +3 −0 src/evaluator.ts
  28. +0 −1 src/index.ts
  29. +2 −0 src/main.ts
  30. +1 −1 src/providers/openai.ts
  31. +71 −2 src/providers/promptfoo.ts
  32. +1 −1 src/redteam/plugins/harmful.ts
  33. +48 −13 src/redteam/providers/crescendo/index.ts
  34. +33 −4 src/redteam/providers/iterative.ts
  35. +1 −1 src/redteam/providers/iterativeImage.ts
  36. +40 −7 src/redteam/providers/iterativeTree.ts
  37. +77 −34 src/redteam/strategies/multilingual.ts
  38. +3 −0 src/table.ts
  39. +32 −1 src/telemetry.ts
  40. +13 −0 src/testCases.ts
  41. +30 −10 test/redteam/providers/iterativeTree.test.ts
4 changes: 3 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -3,6 +3,7 @@
"editor.formatOnSave": true,
"cSpell.words": [
"apidevtools",
"chatcompletion",
"cybercrime",
"Dedup",
"deduped",
@@ -21,7 +22,8 @@
"redteam",
"TEMPLATING",
"TESTCASE",
"unindented"
"unindented",
"uuidv"
],
"jest.jestCommandLine": "npm run test --"
}
76 changes: 38 additions & 38 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -2,7 +2,7 @@
"name": "promptfoo",
"description": "LLM eval & testing toolkit",
"author": "Ian Webster",
"version": "0.86.1",
"version": "0.87.0",
"license": "MIT",
"type": "commonjs",
"repository": {
23 changes: 23 additions & 0 deletions site/docs/integrations/google-sheets.md
Original file line number Diff line number Diff line change
@@ -48,6 +48,29 @@ export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-file.json"
tests: https://docs.google.com/spreadsheets/d/1eqFnv1vzkPvS7zG-mYsqNDwOzvSaiIAsKB3zKg9H18c/edit?usp=sharing
```
## Using Custom Providers for Model-Graded Metrics
When using Google Sheets for test cases, you can still use custom providers for model-graded metrics like `llm-rubric` or `similar`. To do this, override the default LLM grader by adding a `defaultTest` property to your configuration:

```yaml
prompts:
- prompt1.txt
- prompt2.txt
providers:
- anthropic:messages:claude-3-5-sonnet-20240620
- openai:chat:gpt-4o-mini
tests: https://docs.google.com/spreadsheets/d/1eqFnv1vzkPvS7zG-mYsqNDwOzvSaiIAsKB3zKg9H18c/edit?usp=sharing
defaultTest:
options:
provider:
text:
id: ollama:llama3.1:70b
embedding:
id: ollama:embeddings:mxbai-embed-large
```

For more information on overriding the LLM grader, see the [model-graded metrics documentation](/docs/configuration/expected-outputs/model-graded/#overriding-the-llm-grader).

## Writing outputs to a Google Sheet

The `outputPath` parameter (`--output` or `-o` on the command line) supports Google Sheets URLs. In order to write, Default Application Credentials must be configured with a service account that has write access.
9 changes: 9 additions & 0 deletions site/docs/intro.md
Original file line number Diff line number Diff line change
@@ -17,6 +17,15 @@ With promptfoo, you can:

The goal: **test-driven LLM development**, not trial-and-error.

<hr/>

**Get Started:**

- [Red teaming](/docs/red-team/quickstart) - LLM security scans
- [Evaluations](/docs/getting-started) - LLM quality benchmarks

<hr/>

promptfoo produces matrix views that let you quickly evaluate outputs across many prompts.

Here's an example of a side-by-side comparison of multiple prompts and inputs:
2 changes: 1 addition & 1 deletion site/docs/red-team/index.md
Original file line number Diff line number Diff line change
@@ -17,7 +17,7 @@ The purpose of red teaming is to identify and address these vulnerabilities befo
In order to do this, we need to systematically generate a wide range of adversarial inputs and evaluate the LLM's responses.

:::tip
Ready to run a red team? Jump to **[Quickstart](#quickstart)**.
Ready to run a red team? Jump to **[Quickstart](/docs/red-team/quickstart/)**.
:::

## How generative AI red teaming works
24 changes: 1 addition & 23 deletions site/docs/red-team/quickstart.md
Original file line number Diff line number Diff line change
@@ -111,29 +111,7 @@ The `init` step will do this for you automatically, but in case you'd like to ma

This will generate several hundred adversarial inputs across many categories of potential harm and save them in `redteam.yaml`.

You can reduce the number of test cases by setting the specific [plugins](/docs/guides/llm-redteaming#step-3-generate-adversarial-test-cases) you want to run. For example, to only generate harmful inputs:

<Tabs groupId="installation-method">
<TabItem value="npx" label="npx" default>
<CodeBlock language="bash">
npx promptfoo@latest redteam generate --plugins harmful
</CodeBlock>
</TabItem>
<TabItem value="npm" label="npm">
<CodeBlock language="bash">
promptfoo redteam generate --plugins harmful
</CodeBlock>
</TabItem>
<TabItem value="brew" label="brew">
<CodeBlock language="bash">
promptfoo redteam generate --plugins harmful
</CodeBlock>
</TabItem>
</Tabs>

Run `npx promptfoo@latest redteam generate --help` to see all available plugins.

### Changing the provider
## Changing the provider (optional)

By default we use OpenAI's `gpt-4o` model to generate the adversarial inputs, but we support hundreds of other models. Learn more about [setting the provider](/docs/red-team/configuration/#providers).

38 changes: 38 additions & 0 deletions site/docs/usage/troubleshooting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
sidebar_position: 60
---

# Troubleshooting

## Out of memory error

To increase the amount of memory available to Promptfoo, increase the node heap size using the `--max-old-space-size` flag. For example:

```bash
# 8192 MB is 8 GB. Set this to an appropriate value for your machine.
NODE_OPTIONS="--max-old-space-size=8192" promptfoo eval
```

## OpenAI API key is not set

If you're using OpenAI, you set the `OPENAI_API_KEY` environment variable or add `apiKey` to the provider config.

If you're not using OpenAI but still receiving this message, you probably have some [model-graded metric](/docs/configuration/expected-outputs/model-graded/) such as `llm-rubric` or `similar` that requires you to [override the grader](/docs/configuration/expected-outputs/model-graded/#overriding-the-llm-grader).

Follow the instructions to override the grader, e.g. using the `defaultTest` property.

In this example, we're overriding the text and embedding providers to use Azure OpenAI (gpt-4o for text, and ada-002 for embedding).

```yaml
defaultTest:
options:
provider:
text:
id: azureopenai:chat:gpt-4o-deployment
config:
apiHost: xxx.openai.azure.com
embedding:
id: azureopenai:embeddings:text-embedding-ada-002-deployment
config:
apiHost: xxx.openai.azure.com
```
Loading