Faithfulness

この例では、MastraのFaithfulnessメトリクスを使用して、回答が提供されたコンテキストと比較してどれだけ事実に忠実であるかを評価する方法を示します。

概要

この例では、以下の方法を示します。

Faithfulnessメトリクスの設定
事実の正確性の評価
Faithfulnessスコアの分析
異なる正確性レベルへの対応

セットアップ

環境設定

環境変数を必ず設定してください：

.env


OPENAI_API_KEY=your_api_key_here

依存関係

必要な依存関係をインポートします：

src/index.ts


import { openai } from "@ai-sdk/openai";
import { FaithfulnessMetric } from "@mastra/evals/llm";

使用例

高い忠実度の例

すべての主張がコンテキストによって裏付けられている応答を評価します：

src/index.ts


const context1 = [
  "The Tesla Model 3 was launched in 2017.",
  "It has a range of up to 358 miles.",
  "The base model accelerates 0-60 mph in 5.8 seconds.",
];
 
const metric1 = new FaithfulnessMetric(openai("gpt-4o-mini"), {
  context: context1,
});
 
const query1 = "Tell me about the Tesla Model 3.";
const response1 =
  "The Tesla Model 3 was introduced in 2017. It can travel up to 358 miles on a single charge and the base version goes from 0 to 60 mph in 5.8 seconds.";
 
console.log("Example 1 - High Faithfulness:");
console.log("Context:", context1);
console.log("Query:", query1);
console.log("Response:", response1);
 
const result1 = await metric1.measure(query1, response1);
console.log("Metric Result:", {
  score: result1.score,
  reason: result1.info.reason,
});
// Example Output:
// Metric Result: { score: 1, reason: 'All claims are supported by the context.' }

忠実度が混在している例

一部の主張がコンテキストで裏付けられていない応答を評価します：

src/index.ts


const context2 = [
  "Python was created by Guido van Rossum.",
  "The first version was released in 1991.",
  "Python emphasizes code readability.",
];
 
const metric2 = new FaithfulnessMetric(openai("gpt-4o-mini"), {
  context: context2,
});
 
const query2 = "What can you tell me about Python?";
const response2 =
  "Python was created by Guido van Rossum and released in 1991. It is the most popular programming language today and is used by millions of developers worldwide.";
 
console.log("Example 2 - Mixed Faithfulness:");
console.log("Context:", context2);
console.log("Query:", query2);
console.log("Response:", response2);
 
const result2 = await metric2.measure(query2, response2);
console.log("Metric Result:", {
  score: result2.score,
  reason: result2.info.reason,
});
// Example Output:
// Metric Result: { score: 0.5, reason: 'Only half of the claims are supported by the context.' }

低い忠実度の例

コンテキストと矛盾する応答を評価します：

src/index.ts


const context3 = [
  "Mars is the fourth planet from the Sun.",
  "It has a thin atmosphere of mostly carbon dioxide.",
  "Two small moons orbit Mars: Phobos and Deimos.",
];
 
const metric3 = new FaithfulnessMetric(openai("gpt-4o-mini"), {
  context: context3,
});
 
const query3 = "What do we know about Mars?";
const response3 =
  "Mars is the third planet from the Sun. It has a thick atmosphere rich in oxygen and nitrogen, and is orbited by three large moons.";
 
console.log("Example 3 - Low Faithfulness:");
console.log("Context:", context3);
console.log("Query:", query3);
console.log("Response:", response3);
 
const result3 = await metric3.measure(query3, response3);
console.log("Metric Result:", {
  score: result3.score,
  reason: result3.info.reason,
});
// Example Output:
// Metric Result: { score: 0, reason: 'The response contradicts the context.' }

結果の理解

この指標は以下を提供します：

0から1の間のfaithfulnessスコア：
- 1.0: 完全なfaithfulness - すべての主張が文脈によって裏付けられている
- 0.7-0.9: 高いfaithfulness - ほとんどの主張が裏付けられている
- 0.4-0.6: 混合的なfaithfulness - 一部の主張が裏付けられていない
- 0.1-0.3: 低いfaithfulness - ほとんどの主張が裏付けられていない
- 0.0: faithfulnessなし - 主張が文脈と矛盾している
スコアの詳細な理由、以下の分析を含む：
- 主張の検証
- 事実の正確性
- 矛盾
- 全体的なfaithfulness

GitHubで例を見る