Qlean Dataset、「日本語・1話者・怪談系テーマの朗読音声コーパスとトランスクリプト」を提供開始

Visual Bank株式会社

Qlean Dataset、「日本語・1話者・怪談系テーマの朗読

～GENIAC採択企業のVisual Bank、音声・言語系AI開発向けの日本語朗読データでASR・LLM研究を支援～

[画像1: https://prcdn.freetls.fastly.net/release_image/108024/140/108024-140-e79a31b0c7c660a946ebf04820969c3d-1200x630.jpg?width=536&quality=85%2C75&format=jpeg&auto=webp&fit=bounds&bg-color=fff ]

Visual Bank株式会社（東京都港区、代表取締役CEO：永井真之）は、傘下の株式会社アマナイメージズを通じて展開するAI学習用データソリューション「Qlean Dataset（キュリンデータセット）」において、ASR（自動音声認識）、音声理解、LLM（大規模言語モデル）などの音声・言語系AI開発および研究用途に向けた「日本語・1話者・怪談系テーマの朗読音声コーパスとトランスクリプト」の提供を開始しました。

本データセットは、怪談や怖い話といった物語性のある日本語テキストを題材に、日本人話者が一人で朗読する音声と、その発話内容を忠実に書き起こしたトランスクリプトで構成されています。物語の進行に伴い、不安感や緊張感を含む語り口が自然に表出する朗読音声が収録されており、読み上げ調の音声に加えて、感情を伴う連続発話としての音声表現を含んでいます。
怪談というテーマ特性上、抑揚や間、声のトーン変化が文脈と密接に結びついており、単文単位の音声認識にとどまらず、長文コンテキストを前提とした音声理解や言語モデル学習に利用しやすい構成となっています。1話者による朗読形式であるため、話者分離を前提としないモデル検証や、話者条件を固定した音声・言語挙動の分析にも適しています。

Qlean Datasetでは、研究用途から商用展開を見据えたAI開発までを想定し、データの権利関係や利用条件を整理したうえで、実運用に耐える学習用データの提供を行っています。本データセットも、音声・言語系AI開発の現場における検証・評価・学習フェーズでの活用を想定し、データ設計されており、Qlean Datasetが提供するオリジナルデータラインナップ「AIデータレシピ」の一つとして提供されます。
今回提供を開始する「日本語・1話者・怪談系テーマの朗読音声コーパスとトランスクリプト」の概要
[表1: https://prtimes.jp/data/corp/108024/table/140_1_ba0d3859964a55edf59656de18acc3c2.jpg?v=202602180415 ]

「日本語・1話者・怪談系テーマの朗読音声コーパスとトランスクリプト」のユースケースイメージ

【研究用途】

- 長文音声入力に対する音声認識・音声理解モデルの評価怪談朗読に含まれる連続的な語りを用い、ASRモデルにおける長文発話の認識精度や、文脈継続時の誤認識傾向を検証する研究に利用できます。- 音声入力を起点とした言語モデルの文脈理解検証音声認識結果を入力とするLLMや音声理解モデルに対し、物語文脈の保持や内容理解の挙動を評価する用途に利用できます。

【産業用途】

- 音声対話AI・ナレーション生成AIの検証用データ怪談朗読に含まれる抑揚や間を含んだ音声表現を用い、音声対話AIや音声生成AIにおける入力理解や出力品質の検証に利用できます。- コールセンター・音声UI向け音声処理モデルの事前検証感情を含む連続発話音声を用いて、音声UIや音声処理基盤における認識安定性や誤動作リスクの検証用途に利用できます。

『Qlean Dataset（キュリンデータセット）』について
『Qlean Dataset』は、Visual Bank傘下の株式会社アマナイメージズが提供する商用利用可能なAI学習用データソリューションです。
画像・動画・音声・3D・テキストなど、多様な形式のデータに対応し、研究・商用いずれの用途でも安全に利用できる環境を整備しています。
また、株式会社千葉ロッテマリーンズや株式会社東洋経済新報社をはじめとするデータパートナーとの協業を通じ、業界特化・最新トレンドに即したデータラインナップ『AIデータレシピ』を継続的に拡充しています。
Qlean Datasetは、AI開発現場におけるデータ収集・整備の負荷を軽減し、権利クリアで法的リスクのないAI開発環境の構築を支援します。
▶ Qlean Datasetサイト：https://qleandataset.visual-bank.co.jp/
▶ AIデータレシピ：https://qleandataset.visual-bank.co.jp/lineup
[画像2: https://prcdn.freetls.fastly.net/release_image/108024/140/108024-140-436e162881070793f26ffee3eb1a9eb9-1813x1116.jpg?width=536&quality=85%2C75&format=jpeg&auto=webp&fit=bounds&bg-color=fff ]

[画像3: https://prcdn.freetls.fastly.net/release_image/108024/140/108024-140-f6bd4eefa058525c8104a52e20cfe7b0-960x540.jpg?width=536&quality=85%2C75&format=jpeg&auto=webp&fit=bounds&bg-color=fff ]

[画像4: https://prcdn.freetls.fastly.net/release_image/108024/140/108024-140-cd197cc34bdf207dd9c9cccda630f70b-960x540.jpg?width=536&quality=85%2C75&format=jpeg&auto=webp&fit=bounds&bg-color=fff ]

[画像5: https://prcdn.freetls.fastly.net/release_image/108024/140/108024-140-75db92b0d10739e9779e2567a330858d-960x540.jpg?width=536&quality=85%2C75&format=jpeg&auto=webp&fit=bounds&bg-color=fff ]

『Qlean Dataset』の提供するデータセット『AIデータレシピ』の特徴
- すべての被写体から同意取得- 既存データは最短1日で納品可能- カスタム撮影・収録・収集による独自データ構築にも対応
お問い合せ

Visual Bank株式会社
AI開発力を最大化する次世代型データインフラを構築・提供するスタートアップ企業として、「あらゆるデータの可能性を解き放つ」をミッションに掲げ事業活動を展開。漫画家の「もっと描きたい！」をサポートするAI補助ツールを提供する『THE PEN』の他、AI学習用データセット開発サービス『Qlean Dataset（キュリンデータセット）』を提供する株式会社アマナイメージズを100%子会社に持つ。
また、Visual Bankは国の研究開発プログラム「GENIAC」にも採択され、社会実装に向けた取り組みを加速させています。
代表取締役CEO：永井真之
所在地：〒107-0062 東京都港区南青山7-1-7 C-Cube南青山ビル6F
Visual Bank企業URL：https://visual-bank.co.jp/
アマナイメージズ企業URL：https://amanaimages.com/about/
[画像6: https://prcdn.freetls.fastly.net/release_image/108024/140/108024-140-caf794202c687aa609de81d0f33941da-1200x630.jpg?width=536&quality=85%2C75&format=jpeg&auto=webp&fit=bounds&bg-color=fff ]

Qlean Dataset Launches Japanese Single-Speaker Horror Story Read-Aloud Speech Corpus with Transcripts
Emotional Narrative Speech Data for ASR, Speech Understanding, and Long-Context LLM Evaluation

Visual Bank Inc. (Minato-ku, Tokyo; CEO: Saneyuki Nagai) has launched a new dataset under its AI training data solution, Qlean Dataset, through its subsidiary amanaimages Inc.: a Japanese single-speaker horror-themed read-aloud speech corpus with transcripts designed for speech- and language-based AI development and research, including Automatic Speech Recognition (ASR), speech understanding, and Large Language Models (LLMs).
This dataset consists of Japanese audio recordings in which a native Japanese speaker reads horror and ghost-story texts aloud, paired with transcripts that faithfully reflect the spoken content. As the narrative progresses, the speaker naturally expresses tension and unease, capturing emotionally nuanced delivery in addition to structured read speech. The recordings therefore include both stable narration and continuous, emotion-infused speech suitable for advanced speech modeling.
Because horror storytelling relies heavily on prosody, pauses, and tonal shifts closely tied to narrative context, the dataset supports not only sentence-level speech recognition but also long-context speech understanding and language model training. As the corpus is recorded in a single-speaker format, it is well suited for model evaluation without speaker separation, as well as controlled analysis of speech and language behavior under fixed speaker conditions.
Qlean Dataset provides training data structured for both research and commercial AI development, with rights and usage conditions clearly organized to support real-world deployment. This corpus is designed for validation, evaluation, and training phases in speech and language AI development and is offered as part of Qlean Dataset’s original data lineup, AI Data Recipe.

Dataset Overview：Japanese Single-Speaker Horror-Themed Read-Aloud Speech Corpus with Transcripts
[表2: https://prtimes.jp/data/corp/108024/table/140_2_ecf992a37ef7dd831c37078f87c07a7f.jpg?v=202602180415 ]
Use Case Examples

Research Applications

- Evaluation of ASR and Speech Understanding Models for Long-Form Audio InputThe continuous narrative structure of horror storytelling enables evaluation of long-utterance recognition accuracy and analysis of error patterns across extended contextual speech in ASR systems.- Context Retention Assessment for Language Models Using Speech InputThe corpus can be used to evaluate how LLMs or speech understanding models handle narrative context retention and semantic comprehension when processing ASR outputs derived from extended storytelling audio.

Industrial Applications

- Validation Data for Conversational AI and Voice Generation SystemsThe emotionally expressive speech, including prosodic variation and pauses, can be used to evaluate input comprehension and output quality in conversational AI and speech synthesis systems.- Pre-Deployment Testing for Call Center and Voice UI Processing ModelsContinuous speech containing emotional nuance supports validation of recognition stability and operational risk assessment for voice UI systems and speech processing infrastructure.
About Qlean Dataset
Qlean Dataset is a commercial-use-ready AI training data solution provided by Amana Images Inc., a subsidiary of Visual Bank Inc.
It supports a wide range of data types, including images, videos, audio, 3D assets, and text, enabling both research and commercial AI development in a legally safe environment.
Through collaborations with data partners such as Chiba Lotte Marines Co., Ltd. and Toyo Keizai Inc., Qlean Dataset continues to expand its specialized, industry-focused lineup known as the “AI Data Recipe.”
By reducing the operational burden of data collection and preparation, Qlean Dataset helps organizations establish AI development environments that are both legally compliant and risk-free.
▶ Qlean Dataset: https://qleandataset.visual-bank.co.jp/en
▶ AI Data Recipe: https://qleandataset.visual-bank.co.jp/en/lineup
[画像7: https://prcdn.freetls.fastly.net/release_image/108024/140/108024-140-8c63c7e591b8960d5392a2e051bf4be8-1813x1116.jpg?width=536&quality=85%2C75&format=jpeg&auto=webp&fit=bounds&bg-color=fff ]

[画像8: https://prcdn.freetls.fastly.net/release_image/108024/140/108024-140-738d2d371bb35ca70ea6ea4e8c3ec377-960x540.jpg?width=536&quality=85%2C75&format=jpeg&auto=webp&fit=bounds&bg-color=fff ]

[画像9: https://prcdn.freetls.fastly.net/release_image/108024/140/108024-140-34d5c4756c8f6721c2d8ab8edfd1dcba-960x540.jpg?width=536&quality=85%2C75&format=jpeg&auto=webp&fit=bounds&bg-color=fff ]

[画像10: https://prcdn.freetls.fastly.net/release_image/108024/140/108024-140-a1a7f1a7556bd81afdd534aeac57edf3-960x540.jpg?width=536&quality=85%2C75&format=jpeg&auto=webp&fit=bounds&bg-color=fff ]

Key Features of Qlean Dataset
- Existing datasets deliverable within one business day- Custom data collection and recording services available
▶ Contact: https://qleandataset.visual-bank.co.jp/en/contact

About Visual Bank Inc.
Visual Bank Inc. is a Tokyo-based startup building Next-Generation Data infrastructure to enhance AI development capabilities under the mission “Unlocking Data Accessibility.”
The company operates THE PEN, an AI-assisted creative tool for manga artists and the Qlean Dataset service.
Its subsidiaries include Amana Images Inc., one of Japan’s largest photostock providers; Qlean Dataset, which leads research and development in AI data; and THE PEN Inc., an AI-assisted creative tool for manga artists.
CEO: Saneyuki Nagai
Address: 6F, C-Cube Minami Aoyama Building, 7-1-7 Minami-Aoyama, Minato-ku, Tokyo 107-0062
Corporate Site: https://visual-bank.co.jp/en
Amana Images: https://qleandataset.visual-bank.co.jp/en/company-overview

プレスリリース提供：PR TIMES

Qlean Dataset、「日本語・1話者・怪談系テーマの朗読

記事提供：PRTimes

その他 – とれまがニュース

経済や政治がわかる新聞社や通信社の時事ニュースなど配信