SummarizationMetric

, SummarizationMetricは、LLMによる要約が元のテキストの内容をどれだけ正確に捉え、事実に基づいているかを評価します。この指標は、アライメント（事実の正確性）とカバレッジ（重要な情報の網羅）の2つの側面を組み合わせており、どちらの品質も優れた要約に不可欠であることを保証するために最小スコアを使用します。

基本的な使い方


import { openai } from "@ai-sdk/openai";
import { SummarizationMetric } from "@mastra/evals/llm";
 
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
 
const metric = new SummarizationMetric(model);
 
const result = await metric.measure(
  "The company was founded in 1995 by John Smith. It started with 10 employees and grew to 500 by 2020. The company is based in Seattle.",
  "Founded in 1995 by John Smith, the company grew from 10 to 500 employees by 2020.",
);
 
console.log(result.score); // Score from 0-1
console.log(result.info); // Object containing detailed metrics about the summary

コンストラクタのパラメータ

model:

LanguageModel

要約を評価するために使用されるモデルの設定

options?:

SummarizationMetricOptions

= { scale: 1 }

メトリクスの設定オプション

SummarizationMetricOptions

scale?:

number

= 1

スコアの最大値

measure() のパラメーター

input:

string

要約する元のテキスト

output:

string

評価する生成された要約

戻り値

score:

number

要約スコア（0からスケール、デフォルトは0-1）

info:

object

要約に関する詳細な指標を含むオブジェクト

string

reason:

string

スコアの詳細な説明（整合性と網羅性の両方の観点を含む）

number

alignmentScore:

number

整合性スコア（0から1）

number

coverageScore:

number

網羅性スコア（0から1）

スコアリングの詳細

このメトリックは、要約を2つの重要な要素で評価します。

アライメントスコア：事実の正確性を測定
- 要約から主張を抽出
- 各主張を元のテキストと照合
- 「yes」「no」「unsure」の判定を付与
カバレッジスコア：重要情報の網羅性を測定
- 元のテキストから重要な質問を生成
- 要約がこれらの質問に答えているか確認
- 情報の包含と網羅性を評価

スコアリングプロセス

アライメントスコアを計算：
- 要約から主張を抽出
- 元テキストと照合
- 計算式：supported_claims / total_claims
カバレッジスコアを決定：
- 元テキストから質問を生成
- 要約が回答しているか確認
- 完全性を評価
- 計算式：answerable_questions / total_questions

最終スコア：min(alignment_score, coverage_score) * scale

スコアの解釈

（0からscale、デフォルトは0-1）

1.0：完璧な要約 - 完全に事実に基づき、すべての重要情報を網羅
0.7-0.9：強力な要約だが、わずかな抜けや軽微な不正確さあり
0.4-0.6：中程度の品質で、重要な抜けや不正確さが目立つ
0.1-0.3：大きな抜けや事実誤認がある低品質な要約
0.0：無効な要約 - 完全に不正確、または重要な情報が欠落

分析付きの例


import { openai } from "@ai-sdk/openai";
import { SummarizationMetric } from "@mastra/evals/llm";
 
// Configure the model for evaluation
const model = openai("gpt-4o-mini");
 
const metric = new SummarizationMetric(model);
 
const result = await metric.measure(
  "The electric car company Tesla was founded in 2003 by Martin Eberhard and Marc Tarpenning. Elon Musk joined in 2004 as the largest investor and became CEO in 2008. The company's first car, the Roadster, was launched in 2008.",
  "Tesla, founded by Elon Musk in 2003, revolutionized the electric car industry starting with the Roadster in 2008.",
);
 
// Example output:
// {
//   score: 0.5,
//   info: {
//     reason: "The score is 0.5 because while the coverage is good (0.75) - mentioning the founding year,
//           first car model, and launch date - the alignment score is lower (0.5) due to incorrectly
//           attributing the company's founding to Elon Musk instead of Martin Eberhard and Marc Tarpenning.
//           The final score takes the minimum of these two scores to ensure both factual accuracy and
//           coverage are necessary for a good summary."
//     alignmentScore: 0.5,
//     coverageScore: 0.75,
//   }
// }

SummarizationMetric

基本的な使い方

コンストラクタのパラメータ

model:

options?:

SummarizationMetricOptions

scale?:

measure() のパラメーター

input:

output:

戻り値

score:

info:

reason:

alignmentScore:

coverageScore:

スコアリングの詳細

スコアリングプロセス

スコアの解釈

分析付きの例

関連