NSLinguisticTagger – AppleScriptの穴

自然言語テキストを与えると、記述言語を推測して、その言語コード（jaとか）、性別、Premium（高音質）音声かどうか（true/false）をもとにText to Speechの読み上げ音声キャラクタをしぼりこんで、sayコマンドで音声読み上げするAppleScriptです。

# ScriptのリストについているURL Linkを書き換えました

自然言語から推測される言語コードと、TTS音声キャラクタに振られている言語コードの間に仕様的な食い違いがあるので、中国語の自動判定を行うためには、（若干の）処理を追加する必要があります。

自然言語テキストから取得できるのは「簡体字」「繁体字」のコードである一方で、TTS読み上げキャラクタが持っているのは、China、HongKong、Taiwanと国コードなので、対照表でもつけるか、いっそ全部「zh」でくくってランダム選択するか、、、はたまた、実行マシンの緯度・経度情報から判定するか、テーブルを編集可能なようにしておいて、テーブルのルールを決め打ちで反映するとか、、、、

▲システム環境設定＞アクセシビリティ＞読み上げコンテンツ＞システムの声　のポップアップメニューで、一番下に「カスタマイズ」の項目があり、Text To Speech読み上げキャラクタの追加が行える

▲追加したTTSキャラクタの音声データは自動でダウンロードが行われる。TTS用にSiri音声は指定できないが、「ショートカット」の音声読み上げでは指定できる。このあたり、外部のTTS音声データ提供会社との契約によるものなのか、あるいは管理プログラムが異なるのか？

AppleScript名：与えられた自然言語テキストから言語を推測して、指定の性別で、TTSキャラクタを自動選択して読み上げ v1（簡体字、繁体字　未サポート）.scptd

—
–　　Created by: Takaaki Naganoya
–　　Created on: 2022/02/05
—
–　　Copyright © 2022 Piyomaru Software, All Rights Reserved
—
use AppleScript version "2.4"
use scripting additions
use framework "Foundation"
use framework "AppKit"

property NSSpeechSynthesizer : a reference to current application’s NSSpeechSynthesizer

set str1 to "こんにちは"

–指定文字列が何語かを推測して、言語コード（Short）を取得
set a1Res to guessLanguageCodeOf(str1) of me

–指定の言語コード（Short）をキーにしてTTS属性情報を取得
set vList to retAvailableTTSbyShortLangCodeAndSexAndPremium(a1Res, "Female", true) of me
if vList = {} then return

–取得したTTS情報リストから、てきとーに項目を取得
set fV to contents of first item of vList

set vName to VoiceName of fV
say str1 using vName

—
on retAvailableTTSbyShortLangCodeAndSexAndPremium(aLangShortCode as string, aSex as string, premiumFlag as boolean)
　　set outList to {}
　　
　　if aSex is not in {"Male", "Female"} then error "Sex code is wrong"
　　
　　set aList to NSSpeechSynthesizer’s availableVoices()
　　set bList to aList as list
　　
　　repeat with i in bList
　　　　set j to contents of i
　　　　set aInfo to (NSSpeechSynthesizer’s attributesForVoice:j)
　　　　set aInfoRec to aInfo as record
　　　　
　　　　–読み上げ対象文字データは多すぎるので削除しておく
　　　　set VoiceIndividuallySpokenCharacters of aInfoRec to {}
　　　　set VoiceSupportedCharacters of aInfoRec to {}
　　　　
　　　　set aName to VoiceName of aInfoRec
　　　　set aLangCode to VoiceLocaleIdentifier of aInfoRec
　　　　
　　　　set aGender to VoiceGender of aInfoRec
　　　　set aVID to VoiceIdentifier of aInfoRec
　　　　
　　　　if (aLangCode starts with aLangShortCode) and (aGender = "VoiceGender" & aSex) then
　　　　　　if premiumFlag = true then
　　　　　　　　if aVID ends with "premium" then
　　　　　　　　　　set the end of outList to aInfoRec
　　　　　　　　end if
　　　　　　else
　　　　　　　　set the end of outList to aInfoRec
　　　　　　end if
　　　　end if
　　end repeat
　　
　　return outList
end retAvailableTTSbyShortLangCodeAndSexAndPremium

–文字列から言語を推測して言語名を返す
on guessLanguageOf(theString)
　　set theTagger to current application’s NSLinguisticTagger’s alloc()’s initWithTagSchemes:{current application’s NSLinguisticTagSchemeLanguage} options:0
　　theTagger’s setString:theString
　　set languageID to theTagger’s tagAtIndex:0 |scheme|:(current application’s NSLinguisticTagSchemeLanguage) tokenRange:(missing value) sentenceRange:(missing value)
　　return ((current application’s NSLocale’s localeWithLocaleIdentifier:"en")’s localizedStringForLanguageCode:languageID) as text
end guessLanguageOf

–文字列から言語を推測して言語コードを返す
on guessLanguageCodeOf(theString)
　　set theTagger to current application’s NSLinguisticTagger’s alloc()’s initWithTagSchemes:{current application’s NSLinguisticTagSchemeLanguage} options:0
　　theTagger’s setString:theString
　　set languageID to theTagger’s tagAtIndex:0 |scheme|:(current application’s NSLinguisticTagSchemeLanguage) tokenRange:(missing value) sentenceRange:(missing value)
　　return languageID as text
end guessLanguageCodeOf

★Click Here to Open This Script　

AppleScriptの穴

Useful & Practical AppleScript archive. Click '★Click Here to Open This Script' Link to download each AppleScript

タグ: NSLinguisticTagger

与えられた自然言語テキストから言語を推測して、指定の性別で、TTSキャラクタを自動選択して読み上げ