簡易日本語パーサー「easyJParse」のバージョンアップ版です。AppleScriptライブラリ「BridgePlus」を利用しないように改めました。
簡易日本語パーサーというのは、日本語の文を単語に分解するプログラムですが、品詞情報や係り受けの情報が得られるわけではなく、単に単語に分解するだけのもので、用途を日本語コマンド解釈などに限定した簡易版の形態素解析器もどきソフトウェアです。特定の人名など区切られて困る単語についてはカギ括弧などで括ることで(例:「ぴよまるソフトウェア」)まとまった単語として出力する機能を持たせています。
→ easyJParse v3
→ easyJParse v4
前バージョンまではBridgePlus Script Libraryを利用していましたが、同ライブラリがFrameworkを含んでいるために、確実に動かせるように設定するには技量(理解と慣れ)が必要です。自分の手元では動かせていますが、ユーザーによってはBridgePlusをmacOS 10.15以降のMacで利用できないケースも見られ(たぶん、操作間違い)、BridgePlusへの依存がマイナスポイントになりつつあるように感じられます。
本ScriptでBridgePlusから利用しているメソッドは2つ。どちらも既存のAppleScriptのルーチンの組み合わせで再現できる程度の簡単なもの。これらをすべて既存のルーチンの組み合わせで置き換えました。BridgePlus内蔵の機能を書き換える際に、扱うデータサイズはあまり大きくないものであることを前提に最適化しました。あまり巨大なデータを扱うのには向いていませんが、小さなデータを高速に処理できるようにしてあります。
MacBookPro10,1, macOS Version 10.14.6 (Build 18G8005), 100 iterations First Run Total Time Average Median Maximum Minimum Std.Dev. First 0.6685 0.6236 0.0062 0.0059 0.0083 0.0054 0.0008
正直なところ、この程度の極小データサイズだとCocoaの機能を利用するメリットがあまりないので、Cocoaを使わないように書き換えると高速化できます。高速化は必要に応じて行う程度でしょう。
外部ライブラリに依存しなくなったため、たとえばCotEditorのメニューから呼び出すScriptや、FileMaker Pro Scriptの中にまるごと日本語パーサーを突っ込むといった真似ができます。
AppleScript名:easyJParse v5.scptd |
— Created 2018-09-26 by Takaaki Naganoya — Modified 2020-12-31 by Takaaki Naganoya — 2020 Piyomaru Software use AppleScript version "2.5" — El Capitan (10.11) or later use framework "Foundation" use scripting additions property NSArray : a reference to current application’s NSArray property NSSortDescriptor : a reference to current application’s NSSortDescriptor set aTargName to "Finderで選択中のAI書類上の「製品名」レイヤーから抜き出したコードをもとにスペック情報をGoogle Spreadsheet「製品コード表」から展開して保存。" set aList to parseJ(aTargName, true) of me –> {"Finder", "で", "選択", "中", "の", "AI", "書類", "上", "の", "「", "製品名", "」", "レイヤー", "から", "抜き出し", "た", "コード", "を", "もと", "に", "スペック", "情報", "を", "Google", " ", "Spreadsheet", "「", "製品コード表", "」", "から", "展開", "し", "て", "保存", "。"}–v4 –> {"Finder", "で", "選択", "中", "の", "AI", "書類", "上", "の", "「", "製品名", "」", "レイヤー", "から", "抜き出し", "た", "コード", "を", "もと", "に", "スペック", "情報", "を", "Google", " ", "Spreadsheet", "「", "製品コード表", "」", "から", "展開", "し", "て", "保存", "。"}–v5 return aList set aTargName to "私の名前は「長野谷」です。" set aList to parseJ(aTargName, true) of me –> {"私", "の", "名前", "は", "「", "長野谷", "」", "です", "。"}–v4 –> {"私", "の", "名前", "は", "「", "長野谷", "」", "です", "。"}–v5 –カッコのネスティングとクロス(エラー)については、処理せずにそのまま出力 on parseJ(aTargStr as string, pickupPhraseByBracketPair as boolean) copy aTargStr to tStr set cList to characters of tStr set wList to words of tStr set cLen to length of cList set w2List to {} set w3List to {} set aCount to 0 set lastPos to 0 repeat with i in wList set j to contents of i using terms from scripting additions set anOffset to offset of j in tStr end using terms from if anOffset is not equal to 1 then set aChar to character (lastPos + 1) of aTargStr set the end of w3List to {wordList:aChar, characterList:{aChar}, startPos:(lastPos + 1), endPos:(lastPos + 1)} end if set aLen to length of j set w2List to w2List & (characters of j) set startPointer to (anOffset + aCount) set endPointer to (anOffset + aCount + aLen – 1) set the end of w3List to {wordList:j, characterList:(characters of j), startPos:startPointer, endPos:endPointer} set trimStart to (anOffset + aLen) if trimStart > (length of tStr) then set trimStart to 1 end if set tStr to text trimStart thru -1 of tStr set aCount to aCount + anOffset + aLen – 1 copy endPointer to lastPos end repeat –句読点など。文末の処理 if endPointer is not equal to cLen then set the end of w3List to {wordList:tStr, characterList:(characters of tStr), startPos:(lastPos + aCount), endPos:aLen} end if set bArray to sortRecListByLabel((w3List), "startPos", true) of me set cArray to (bArray’s valueForKeyPath:"wordList") as list –カッコでくくった範囲を1つの塊として連結する set bracketList to {"「", "」", "『", "』", "【", "】", "《", "》", "〈", "〉", "(", ")"} set bList to jointItemsBetweenBrackets(cArray, bracketList) of me return bList end parseJ –リストに入れたレコードを、指定の属性ラベルの値でソート on sortRecListByLabel(aRecList as list, aLabelStr as string, ascendF as boolean) set aArray to NSArray’s arrayWithArray:aRecList set sortDesc to NSSortDescriptor’s alloc()’s initWithKey:aLabelStr ascending:ascendF set sortDescArray to NSArray’s arrayWithObject:sortDesc set sortedArray to aArray’s sortedArrayUsingDescriptors:sortDescArray return sortedArray end sortRecListByLabel on offset of bArg in anArg set aClass to class of anArg set bClass to class of bArg if {aClass, bClass} = {text, text} then –case 1 return getOffset(anArg, bArg) of me else if {aClass, bClass} = {list, list} then –case 2 (The target case) return execOffsetList(bArg, anArg) of me else if {aClass, bClass} = {text, list} then –case 3 (Illegular case) return execOffsetList(bArg, {anArg}) of me else if {aClass, bClass} = {list, text} then –case 4 (Illegular case) return execOffsetList({bArg}, anArg) of me end if end offset –1D List同士のoffset演算を行うルーチンの本体 on execOffsetList(aList as list, bList as list) set resList to {} repeat with i in aList set j to contents of i set aCount to 1 repeat with ii in bList set jj to contents of ii if jj = j then set the end of resList to aCount exit repeat end if set aCount to aCount + 1 end repeat end repeat –見つかったItem No.が連続値かどうかチェック set sRes to chkSequential(resList) of me if sRes = true then return contents of first item of resList else return false end if end execOffsetList –与えられた1D Listが連続値かどうかをチェックする on chkSequential(aList) if length of aList = 1 then return true if aList = {} then return false set aFirst to first item of aList set aList to rest of aList repeat with i in aList set j to contents of i if j is not equal to (aFirst + 1) then return false end if copy j to aFirst end repeat return true end chkSequential –テキスト同士のoffset ofを(2.5x fasterで)実行する on getOffset(str, searchStr) set d to divideBy(str, searchStr) if (count d) is less than 2 then return 0 return (length of item 1 of d) + 1 end getOffset on divideBy(str, separator) set delSave to AppleScript’s text item delimiters set the AppleScript’s text item delimiters to separator set strItems to every text item of str set the AppleScript’s text item delimiters to delSave return strItems end divideBy –カッコでくくった範囲を1つの塊として連結する on jointItemsBetweenBrackets(aList as list, bracketList as list) –リスト内のブラケット位置の検出 set aRes to (my indexesOfItems:bracketList inArray:aList base:0) as list –> {9, 12, 15, 18, 22, 25, 27, 29}–0 based if aRes = {} then return aList –位置情報リストを開始位置, 終了位置のペアの2D Listに変換する set cList to my subarraysFrom:(aRes) groupedBy:2 –> {{9, 12}, {15, 18}, {22, 25}, {27, 29}}–0 based –カッコの位置がクロスしていないかチェック(入れ子状態はエラーになる) set dRes to checkCrossRange(cList) of me if dRes = false then return aList set ccList to reverse of cList –順次、ブラケットに囲まれた要素を連結していくので、アイテム数が随時変化する。アイテム番号が狂わないよう後方から処理する必要がある。そのために、リストの要素を逆順に組み替える –> {{27, 29}, {22, 25}, {15, 18}, {9, 12}}–0 based — copy aList to aaList repeat with i in ccList copy i to {s2Dat, e2Dat} set s2Dat to s2Dat + 1 –Array index conversion from 0 to 1 based set e2Dat to e2Dat + 1 –Array index conversion from 0 to 1 based set tmp1 to items 1 thru s2Dat of aaList set tmp2 to (items (s2Dat + 1) thru (e2Dat – 1) of aaList) as string set tmp3 to items e2Dat thru -1 of aaList set aaList to tmp1 & tmp2 & tmp3 end repeat return aaList end jointItemsBetweenBrackets –{始点, 終点}のペアの2D Listが違いにクロスしていないかチェック on checkCrossRange(aList as list) set rList to {} repeat with i in aList copy i to {sRange, eRange} set tmpRange to current application’s NSMakeRange(sRange, eRange – sRange + 1) set the end of rList to tmpRange end repeat repeat with ii in rList set jj to contents of ii repeat with i in rList set j to contents of i if jj is not equal to j then set aRes to current application’s NSIntersectionRange(jj, j) if aRes is not equal to {location:0, |length|:0} then return false end if end if end repeat end repeat return true end checkCrossRange –BridgePlus内の命令を展開 on indexesOfItems:(iList as list) inArray:(aList as list) base:(baseNum as integer) return retIndexesOfNumInArray(iList, aList, baseNum) of me end indexesOfItems:inArray:base: –1Dリスト中のシーケンシャルサーチ(複数) on retIndexesOfNumInArray(aTargetList, aList, baseNum) script obj property list : aList property resList : {} end script if baseNum is not in {0, 1} then return false –set obj’s list to aList set (resList of obj) to {} set aCount to baseNum set hitF to false repeat with i in obj’s list set j to contents of i if j is in aTargetList then set the end of (resList of obj) to aCount end if set aCount to aCount + 1 end repeat return (resList of obj) end retIndexesOfNumInArray on subarraysFrom:(aList as list) groupedBy:(gNum as integer) script spdObj property list : aList property bList : {} end script –Group Num check if gNum = 0 then return false if length of aList < gNum then return false if (length of aList) mod gNum is not equal to 0 then return set (bList of spdObj) to {} set tmpList to {} set aCount to 1 repeat with i in aList set j to contents of i set the end of tmpList to j set aCount to aCount + 1 if aCount > gNum then set the end of (bList of spdObj) to tmpList set tmpList to {} set aCount to 1 end if end repeat return (bList of spdObj) end subarraysFrom:groupedBy: |