完全性評価

この例では、Mastra の Completeness メトリクスを使用して、回答が入力の主要な要素をどれだけ網羅しているかを評価する方法を示します。

概要

この例では、以下の方法を示します。

Completenessメトリクスを設定する
要素の網羅性について回答を評価する
カバレッジスコアを分析する
異なるカバレッジシナリオに対応する

セットアップ

依存関係

必要な依存関係をインポートします：

src/index.ts


import { CompletenessMetric } from "@mastra/evals/nlp";

メトリックの設定

Completenessメトリックを設定します：

src/index.ts


const metric = new CompletenessMetric();

使用例

完全カバレッジの例

すべての要素をカバーしている応答を評価します：

src/index.ts


const text1 = "The primary colors are red, blue, and yellow.";
const reference1 = "The primary colors are red, blue, and yellow.";
 
console.log("Example 1 - Complete Coverage:");
console.log("Text:", text1);
console.log("Reference:", reference1);
 
const result1 = await metric.measure(reference1, text1);
console.log("Metric Result:", {
  score: result1.score,
  info: {
    missingElements: result1.info.missingElements,
    elementCounts: result1.info.elementCounts,
  },
});
// Example Output:
// Metric Result: { score: 1, info: { missingElements: [], elementCounts: { input: 8, output: 8 } } }

部分的カバレッジの例

一部の要素のみカバーしている応答を評価します：

src/index.ts


const text2 = "The primary colors are red and blue.";
const reference2 = "The primary colors are red, blue, and yellow.";
 
console.log("Example 2 - Partial Coverage:");
console.log("Text:", text2);
console.log("Reference:", reference2);
 
const result2 = await metric.measure(reference2, text2);
console.log("Metric Result:", {
  score: result2.score,
  info: {
    missingElements: result2.info.missingElements,
    elementCounts: result2.info.elementCounts,
  },
});
// Example Output:
// Metric Result: { score: 0.875, info: { missingElements: ['yellow'], elementCounts: { input: 8, output: 7 } } }

最小限カバレッジの例

ごく少数の要素しかカバーしていない応答を評価します：

src/index.ts


const text3 = "The seasons include summer.";
const reference3 = "The four seasons are spring, summer, fall, and winter.";
 
console.log("Example 3 - Minimal Coverage:");
console.log("Text:", text3);
console.log("Reference:", reference3);
 
const result3 = await metric.measure(reference3, text3);
console.log("Metric Result:", {
  score: result3.score,
  info: {
    missingElements: result3.info.missingElements,
    elementCounts: result3.info.elementCounts,
  },
});
// Example Output:
// Metric Result: {
//   score: 0.3333333333333333,
//   info: {
//     missingElements: [ 'four', 'spring', 'winter', 'be', 'fall', 'and' ],
//     elementCounts: { input: 9, output: 4 }
//   }
// }

結果の理解

この指標は以下を提供します：

0から1の間のスコア：
- 1.0: 完全なカバレッジ - すべての入力要素を含む
- 0.7-0.9: 高いカバレッジ - 主要な要素のほとんどを含む
- 0.4-0.6: 部分的なカバレッジ - 一部の主要な要素を含む
- 0.1-0.3: 低いカバレッジ - 主要な要素のほとんどが欠落
- 0.0: カバレッジなし - 出力に入力要素が全く含まれていない
詳細な分析内容：
- 見つかった入力要素のリスト
- 一致した出力要素のリスト
- 入力から欠落している要素
- 要素数の比較

GitHubで例を見る