Qlean Dataset、対話音声データセット「日本語2話者LR分離済みプライベート対話音声・トランスクリプト付き」を提供開始

Visual Bank株式会社

Qlean Dataset、対話音声データセット「日本語2話者LR

～87組・約500時間のLR分離済み日本語2話者対話音声をトランスクリプト付きで収録した、基盤モデル開発向け商用利用対応コーパス。GENIAC採択企業のVisual Bankが手がける～

[画像1: https://prcdn.freetls.fastly.net/release_image/108024/172/108024-172-640a1a1bc02b39c58a8ece9531989925-1200x630.jpg?width=536&quality=85%2C75&format=jpeg&auto=webp&fit=bounds&bg-color=fff ]

Visual Bank株式会社（東京都港区、代表取締役CEO 永井真之）は、傘下の株式会社アマナイメージズを通じて展開するAI学習用データソリューション「Qlean Dataset（キュリンデータセット）」において、「日本語2話者LR分離済みプライベート対話音声・トランスクリプト」の提供を開始します。本データセットは、WEB会議形式でステレオLR分離収録された87組・約500時間の日本語2話者対話音声にトランスクリプトを付与したコーパスです。話者ダイアライゼーション・音声分離・ASRなど、音声系基盤モデルの開発やファインチューニングに活用できます。

■ 2話者対話音声データセットとは
2話者対話音声データセットとは、2名の話者がそれぞれ独立したチャンネルに収録された対話音声コーパスです。話者分離（Speaker Diarization）モデルの学習・評価、ASRモデルの対話ドメイン適応のほか、音声基盤モデルやLLMの事前学習・ファインチューニング用データとして活用されます。

■ 今回提供を開始する「日本語2話者LR分離済みプライベート対話音声・トランスクリプト」の概要
本データセットは、性別・年齢の多様性を確保した日本人87組がWEB会議環境で行ったプライベートな対話を、ステレオLR分離形式で収録しています。各話者音声が左右チャンネルに分離済みのため、話者ごとの音声を個別に取り出せる状態で提供されます。趣味・特技・価値観などを題材とした自然な対話シーンを収録しており、スクリプト読み上げではなく自発的発話に近い音声的特性を持ちます。
[表1: https://prtimes.jp/data/corp/108024/table/172_1_0c6a4783606f88f89f2c48a011898761.jpg?v=202606241215 ]
サンプルデータはこちら：https://qleandataset.visual-bank.co.jp/lineup/ds-049

■ よくある質問（FAQ）
Q. このデータセットは話者ダイアライゼーション開発（Speaker Diarization）に使えますか？
A. LR分離済みのため、左右チャンネルに話者が1名ずつ割り当てられています。pyannote.audioやNeMoなど話者ダイアライゼーションモデルのファインチューニングおよびDER（Diarization Error Rate）による性能評価データとして直接利用できます。混合音声からの分離精度を検証するベースラインデータとしても有効です。

Q. ASRモデルの対話ドメイン適応に活用できますか？
A. はい。トランスクリプト付きのため、WhisperやESPnetなど標準語・読み上げ音声ベースで学習済みのASRモデルに対し、対話・自発話ドメインでのLoRAまたはfull fine-tuningに利用できます。WER計測によるドメインギャップ定量評価にも使用可能です。

Q. LLM開発での活用シーンは？
A. 対話内容（趣味・価値観など）に関するトランスクリプトを、SFT（Supervised Fine-Tuning：教師ありファインチューニング）用の対話コーパスとして利用できます。87組・500時間規模の自然な対話テキストは、対話スタイル・自然な会話表現の学習データとして機能します。

Q. TTS（音声合成）への活用は可能ですか？
A. はい。LR分離済みで話者ごとに独立した音声が取り出せるため、VITS・StyleTTS2などへのファインチューニング用の単話者音声データとして活用できます。多様な話者属性を含むため、多話者TTSモデルの構築にも対応しています。

Q. 話者数の追加収録やシチュエーションのカスタム対応は可能ですか？
A. はい。特定の年齢層・性別構成・対話トピックを指定したカスタム収録や、特定ドメイン（医療・金融など）を想定した対話データの追加収集に対応しています。

■ 「日本語2話者LR分離済みプライベート対話音声・トランスクリプト」のユースケースイメージ
- 話者ダイアライゼーション（Speaker Diarization）モデルの学習・評価LR分離済みの2話者音声はグラウンドトゥルースとして機能します。pyannote.audio・NeMo・SpeakerBeamなどのダイアライゼーションモデルに対し、発話セグメント単位のDER評価データとして利用できます。混合音声を生成してから分離精度を測るシミュレーション実験にも応用可能です。- 対話ドメインASRのファインチューニング読み上げ音声コーパスとは異なる自発話・対話特有の言語現象（言い淀み・言い直し・重複発話）を含むため、WhisperやESPnetなどのASRモデルを対話ドメインに適応させるfew-shot・LoRAファインチューニングデータとして有効です。トランスクリプトとのアラインメントによるCER・WER評価にも利用できます。- 音声分離（Speech Separation）モデルの性能評価ステレオLR分離済み音声を混合して疑似混合音声を生成し、Conv-TasNetやDPTNet・SepFormerなどの音声分離モデルの性能をSI-SDR・PESQなどで評価するベンチマークデータセットとして利用できます。- 音声基盤モデル（Speech LLM）の事前学習・継続事前学習音声とテキストを統合的に扱う音声基盤モデル（Speech LLM）の学習には、音声・トランスクリプトが対応付いた大規模データが必要です。500時間規模かつ話者分離済みという特性は、SpeechGPTやQwen-Audioのような音声言語モデルの事前学習・継続事前学習（continual pretraining）用データ、またmulti-modal alignment（音声とテキストの対応学習）用データとして活用できます。- コンタクトセンター向けカスタムSTTエンジン開発WEB会議形式の2話者対話という収録条件が、実際のカスタマーサポート・面接・カウンセリング音声と近い環境を再現しています。Google STT・Amazon Transcribeのカスタム言語モデル構築、またはWhisperのドメイン適応ファインチューニングによる業務特化STTエンジン開発に活用できます。

『Qlean Dataset（キュリンデータセット）』について
『Qlean Dataset』は、Visual Bank傘下の株式会社アマナイメージズが提供する権利クリア・商用利用可能なAI学習用データソリューションです。
音声・画像・動画・3D・テキストなど多様な形式に対応し、基盤モデル開発者をはじめとするAI開発者が、法的リスクなく高品質なデータを調達・活用できる環境を提供しています。
国内外のデータホルダーや、ラジオ・新聞社・通信社等のメディアとの協業により、業界特化・トレンド直結のデータラインナップ『AIデータレシピ』を随時追加中です。既存データは最短2営業日で納品し、その他カスタム収録・収集にも対応しています。
Qlean Datasetサイト：https://qleandataset.visual-bank.co.jp/
AIデータレシピ：https://qleandataset.visual-bank.co.jp/lineup
お問い合わせ
[画像2: https://prcdn.freetls.fastly.net/release_image/108024/172/108024-172-63d9a6e2d4e7e751487b7c743818722a-1813x1116.png?width=536&quality=85%2C75&format=jpeg&auto=webp&fit=bounds&bg-color=fff ]

Visual Bank株式会社
AI開発力を最大化する次世代型データインフラを構築・提供するスタートアップ企業として、「あらゆるデータの可能性を解き放つ」をミッションに掲げ事業活動を展開。漫画家の「もっと描きたい！」をサポートするAI補助ツールを提供する『THE PEN』、AI学習用データセット開発サービス『Qlean Dataset（キュリンデータセット）』を提供する株式会社アマナイメージズを100%子会社に持つ。
また、Visual Bankは国の研究開発プログラム「GENIAC」にも採択され、社会実装に向けた取り組みを加速させています。

代表取締役CEO：永井真之
所在地：〒107-0062 東京都港区南青山7-1-7 C-Cube南青山ビル6F
Visual Bank企業URL：https://visual-bank.co.jp/
アマナイメージズ企業URL：https://amanaimages.com/about/

Qlean Dataset Launches "Japanese 2-Speaker LR-Separated Private Dialogue Speech with Transcripts"
A commercially licensed corpus featuring approximately 500 hours of LR-separated Japanese 2-speaker dialogue audio from 87 pairs, complete with transcripts - purpose-built for foundation model development. Produced by Visual Bank, a GENIAC-selected company.
[画像3: https://prcdn.freetls.fastly.net/release_image/108024/172/108024-172-0b7e7d74787e247cd2719746f813659c-1200x630.jpg?width=536&quality=85%2C75&format=jpeg&auto=webp&fit=bounds&bg-color=fff ]

Visual Bank Co., Ltd. (Minato-ku, Tokyo; CEO: Saneyuki Nagai) announces, through amana images Inc., the launch of a new dataset under Qlean Dataset: "Japanese 2-Speaker LR-Separated Private Dialogue Speech with Transcripts."

This corpus features approximately 500 hours of stereo LR-separated Japanese dialogue audio from 87 speaker pairs, with full transcripts. Recorded in a web conferencing format covering topics such as hobbies, skills, and personal values, the data reflects spontaneous natural speech. Licensed for commercial, research, and generative AI training use.

■ What Is a 2-Speaker Dialogue Speech Dataset?
A 2-speaker dialogue speech dataset is a corpus in which two speakers are recorded on independent channels. It is used for training and evaluating speaker diarization models, adapting ASR models to conversational domains, and as pre-training or fine-tuning data for speech foundation models and large language models (LLMs).
■ Dataset Overview: "Japanese 2-Speaker LR-Separated Private Dialogue Speech with Transcripts"
This dataset captures natural, private conversations conducted over web conferencing by 87 pairs of Japanese speakers with diverse gender and age profiles. Audio is provided in stereo LR-separated format, with each speaker's voice pre-isolated to a dedicated left or right channel. Conversations center on topics such as hobbies, skills, and personal values - reflecting spontaneous, naturalistic speech rather than scripted read-aloud recordings.
[表2: https://prtimes.jp/data/corp/108024/table/172_2_055ad7178df3fdf6ed2e419f4b06d4a2.jpg?v=202606241215 ]
Sample data available at: https://qleandataset.visual-bank.co.jp/en/lineup/ds-049
■ Key Use Cases
- Speaker Diarization - Ground-truth LR audio for DER evaluation and fine-tuning with pyannote.audio, NeMo, and SpeakerBeam. - Conversational ASR Fine-Tuning - Spontaneous speech supports LoRA/full fine-tuning of Whisper and ESPnet, with CER/WER evaluation. - Speech Separation Benchmarking - Generate pseudo-mixed audio to benchmark Conv-TasNet, DPTNet, and SepFormer via SI-SDR and PESQ. - Speech LLM Pre-Training - 500-hour paired audio-transcript data for models such as SpeechGPT and Qwen-Audio. - Contact Center STT - Web conferencing format suits customer support and counseling environments; applicable to Google STT, Amazon Transcribe, and Whisper fine-tuning.

About Qlean Dataset
Qlean Dataset is a commercially licensed AI training data solution provided by amanaimages Inc., a wholly owned subsidiary of Visual Bank. All datasets are rights-cleared for commercial use, giving AI developers a legally secure environment to source and deploy high-quality training data.
The platform covers audio, image, video, 3D, and text modalities - serving foundation model developers and applied AI teams alike. Through partnerships with domestic and international data holders, broadcasters, newspapers, and newswire agencies, Qlean Dataset continuously expands its AI Data Recipe lineup of industry-specific, trend-driven datasets. Existing datasets ship within 2 business days; custom recording and data collection are also available on request.
URL:https://qleandataset.visual-bank.co.jp/en
URL:https://qleandataset.visual-bank.co.jp/en/products/japanese-language-corpora
Contact
[画像4: https://prcdn.freetls.fastly.net/release_image/108024/172/108024-172-524abae48ef5e8a592efe47ff6a2e96d-1813x1116.png?width=536&quality=85%2C75&format=jpeg&auto=webp&fit=bounds&bg-color=fff ]

About Visual Bank Inc.
Visual Bank Group is a technology company developing data infrastructure and AI solutions that support advanced AI development. The company operates THE PEN, an AI tool for manga creators, and its subsidiary, amanaimages Inc., provides commercial digital content and AI training data solutions, including Qlean Dataset. Visual Bank is also a selected participant in GENIAC, a Japanese government initiative supporting the advancement of next generation AI technologies.
CEO: Saneyuki Nagai
Website:https://visual-bank.co.jp/en

プレスリリース提供：PR TIMES

Qlean Dataset、対話音声データセット「日本語2話者LR

記事提供：PRTimes

その他 – とれまがニュース

経済や政治がわかる新聞社や通信社の時事ニュースなど配信