Skip to Content
評価幻覚

ハルシネーション

この例では、Mastra のハルシネーション指標を使用して、回答がコンテキストで提供された情報と矛盾していないかを評価する方法を示します。

概要

この例では、以下の方法を示します。

  1. Hallucinationメトリクスの設定方法
  2. 事実と矛盾する内容の評価方法
  3. Hallucinationスコアの分析方法
  4. 異なる精度レベルへの対応方法

セットアップ

環境設定

環境変数を必ず設定してください:

.env
OPENAI_API_KEY=your_api_key_here

依存関係

必要な依存関係をインポートします:

src/index.ts
import { openai } from "@ai-sdk/openai"; import { HallucinationMetric } from "@mastra/evals/llm";

使用例

ハルシネーションなしの例

コンテキストと完全に一致する応答を評価します:

src/index.ts
const context1 = [ "The iPhone was first released in 2007.", "Steve Jobs unveiled it at Macworld.", "The original model had a 3.5-inch screen.", ]; const metric1 = new HallucinationMetric(openai("gpt-4o-mini"), { context: context1, }); const query1 = "When was the first iPhone released?"; const response1 = "The iPhone was first released in 2007, when Steve Jobs unveiled it at Macworld. The original iPhone featured a 3.5-inch screen."; console.log("Example 1 - No Hallucination:"); console.log("Context:", context1); console.log("Query:", query1); console.log("Response:", response1); const result1 = await metric1.measure(query1, response1); console.log("Metric Result:", { score: result1.score, reason: result1.info.reason, }); // Example Output: // Metric Result: { score: 0, reason: 'The response matches the context exactly.' }

一部ハルシネーションの例

一部の事実と矛盾する応答を評価します:

src/index.ts
const context2 = [ "The first Star Wars movie was released in 1977.", "It was directed by George Lucas.", "The film earned $775 million worldwide.", "The movie was filmed in Tunisia and England.", ]; const metric2 = new HallucinationMetric(openai("gpt-4o-mini"), { context: context2, }); const query2 = "Tell me about the first Star Wars movie."; const response2 = "The first Star Wars movie came out in 1977 and was directed by George Lucas. It made over $1 billion at the box office and was filmed entirely in California."; console.log("Example 2 - Mixed Hallucination:"); console.log("Context:", context2); console.log("Query:", query2); console.log("Response:", response2); const result2 = await metric2.measure(query2, response2); console.log("Metric Result:", { score: result2.score, reason: result2.info.reason, }); // Example Output: // Metric Result: { score: 0.5, reason: 'The response contradicts some facts in the context.' }

完全なハルシネーションの例

すべての事実と矛盾する応答を評価します:

src/index.ts
const context3 = [ "The Wright brothers made their first flight in 1903.", "The flight lasted 12 seconds.", "It covered a distance of 120 feet.", ]; const metric3 = new HallucinationMetric(openai("gpt-4o-mini"), { context: context3, }); const query3 = "When did the Wright brothers first fly?"; const response3 = "The Wright brothers achieved their historic first flight in 1908. The flight lasted about 2 minutes and covered nearly a mile."; console.log("Example 3 - Complete Hallucination:"); console.log("Context:", context3); console.log("Query:", query3); console.log("Response:", response3); const result3 = await metric3.measure(query3, response3); console.log("Metric Result:", { score: result3.score, reason: result3.info.reason, }); // Example Output: // Metric Result: { score: 1, reason: 'The response completely contradicts the context.' }

結果の理解

この指標は以下を提供します:

  1. 0から1の間のハルシネーションスコア:

    • 0.0: ハルシネーションなし - コンテキストとの矛盾なし
    • 0.3-0.4: 低いハルシネーション - わずかな矛盾
    • 0.5-0.6: 混合ハルシネーション - いくつかの矛盾
    • 0.7-0.8: 高いハルシネーション - 多くの矛盾
    • 0.9-1.0: 完全なハルシネーション - すべてのコンテキストと矛盾
  2. スコアの詳細な理由(以下の分析を含む):

    • 主張の検証
    • 発見された矛盾
    • 事実の正確性
    • 全体的なハルシネーションレベル





GitHubで例を見る