固有名詞を抽出するために、住所録から苗字を抽出して1文字以上の長さの苗字をリスト出力するAppleScriptです。
簡易形態素解析を行うさいに、みのまわりの人物の苗字を認識してくれないと知性を感じられないため(例:”長野”,”谷”)、逆に住所録に登録があるぐらい身の回りの人物の苗字を固有名詞として認識してくれるよう、住所録から苗字を抽出させてみました。
抽出した苗字は、missing valueが返ってきたものを除去し、重複を排除し、文字列長でソートして長いものから短いものへと並べ替え。
さらに、文字種別を判定して漢字のみで構成されているものを抽出。さらに、1文字の苗字を排除。
こうして得られたリストの先頭に自分の苗字を入れて、真っ先に自分の名前が認識されるようにしてみました。
住所録へのアクセスは、macOS標準装備の「連絡先.app」にアクセスしてみました。最近はmacOS標準装備のFrameworkにアクセスしてこの手のデータを取得していたりしましたが、その際に利用していたAddressBook.frameworkが廃止になる見込みであるため、新設されたContacts.frameworkを使ったほうが好ましいところです。
ただ、Contacts.frameworkの各種メソッドはObjective-CのBlocks構文の記述を必要とするため、AppleScriptからそのまま呼び出すことができません。
そのため、連絡先.app(Contacts.app)にアクセスすることになった次第です。
固有名詞抽出については、簡易形態素解析を実行するたびに実行するのではなく、1日に1回ぐらいの頻度で実行すればよいと考えています。
AppleScript名:住所録から苗字を抽出して1文字以上の苗字をリスト出力 |
— – Created by: Takaaki Naganoya – Created on: 2018/12/20 — – Copyright © 2018 Piyomaru Software, All Rights Reserved — use AppleScript version "2.4" use scripting additions use framework "Foundation" use bPlus : script "BridgePlus" property NSString : a reference to current application’s NSString property NSScanner : a reference to current application’s NSScanner property NSNumber : a reference to current application’s NSNumber property NSDictionary : a reference to current application’s NSDictionary property NSCountedSet : a reference to current application’s NSCountedSet property NSCharacterSet : a reference to current application’s NSCharacterSet property NSMutableArray : a reference to current application’s NSMutableArray property NSNumberFormatter : a reference to current application’s NSNumberFormatter property NSMutableCharacterSet : a reference to current application’s NSMutableCharacterSet property NSRegularExpressionSearch : a reference to current application’s NSRegularExpressionSearch property NSNumberFormatterRoundUp : a reference to current application’s NSNumberFormatterRoundUp property NSStringTransformFullwidthToHalfwidth : a reference to current application’s NSStringTransformFullwidthToHalfwidth tell application "Contacts" set lastNames to last name of every person set myName to last name of my card end tell load framework –Remove missing value (Cleaning) set aList to (current application’s SMSForder’s arrayByDeletingBlanksIn:(lastNames)) as list –重複部分の削除 set bList to makeUniqueListFrom(aList) of me –文字列長でソート。長い文字列→短い文字列 set cList to sort1DListByStringLength(bList, false) of me –降順 –文字種別を判定して漢字のみから構成されるものを抽出し、1文字のものを除外 set dList to {} repeat with i in cList set j to contents of i set tmpPat to retAtrPatternFromStr(j) of me if tmpPat is equal to "漢" then –1文字以上の苗字のみ出力 if length of j > 1 then set the end of dList to j end if end if end repeat set the beginning of dList to myName return dList –> {"長野谷", "久保田", "三津田", "小笠原", "上田平", "大久保", "長谷川", "長野谷", "伊賀", "伊勢","伊東", "伊藤", "井上", "稲葉" …} –Objective-Cライクなパラメータ記述 on makeUniqueListOf:theList set theSet to current application’s NSOrderedSet’s orderedSetWithArray:theList return (theSet’s array()) as list end makeUniqueListOf: –Pure AS風のパラメータ記述 on makeUniqueListFrom(theList) set aList to my makeUniqueListOf:theList return aList end makeUniqueListFrom –1D Listを文字列長でソート v2 on sort1DListByStringLength(aList as list, sortOrder as boolean) set aArray to current application’s NSArray’s arrayWithArray:aList set desc1 to current application’s NSSortDescriptor’s sortDescriptorWithKey:"length" ascending:sortOrder set desc2 to current application’s NSSortDescriptor’s sortDescriptorWithKey:"self" ascending:true selector:"localizedCaseInsensitiveCompare:" set bArray to aArray’s sortedArrayUsingDescriptors:{desc1, desc2} return bArray as list of string or string end sort1DListByStringLength –文字種別の判定 on retAtrPatternFromStr(aText) set a1List to {"100000", "010000", "001000", "000100", "000010", "000001"} set b1List to {"9", "A", "$", "漢", "あ", "ア"} –数字、アルファベット、記号、全角漢字、全角ひらがな、全角カタカナ set aDict to NSDictionary’s dictionaryWithObjects:b1List forKeys:a1List set aStr to NSString’s stringWithString:aText set bStr to aStr’s stringByDeletingPathExtension() set cStr to zenToHan(bStr) of me set outList to {} set cList to characters of cStr repeat with i in cList set j to contents of i set chk1 to ((my chkNumeric:j) as integer) as string set chk2 to ((my chkAlphabet:j) as integer) as string set chk3 to ((my chkSymbol:j) as integer) as string set chk4 to ((my chkKanji:j) as integer) as string set chk5 to ((my chkHiragana:j) as integer) as string set chk6 to ((my chkKatakana:j) as integer) as string set allKey to (chk1 & chk2 & chk3 & chk4 & chk5 & chk6) as string set aVal to (aDict’s valueForKeyPath:allKey) as string if aVal is not in outList then set the end of outList to aVal end if end repeat return outList as string end retAtrPatternFromStr –全角→半角変換 on zenToHan(aStr) set aString to NSString’s stringWithString:aStr return (aString’s stringByApplyingTransform:(NSStringTransformFullwidthToHalfwidth) |reverse|:false) as string end zenToHan –数字か on chkNumeric:checkString set digitCharSet to NSCharacterSet’s characterSetWithCharactersInString:"0123456789" set ret to my chkCompareString:checkString baseString:digitCharSet return ret as boolean end chkNumeric: –記号か on chkSymbol:checkString set muCharSet to NSCharacterSet’s alloc()’s init() muCharSet’s addCharactersInString:"$\"!~&=#[]._-+`|{}?%^*/’@-/:;()," set ret to my chkCompareString:checkString baseString:muCharSet return ret as boolean end chkSymbol: –漢字か on chkKanji:aChar return detectCharKind(aChar, "[一-龠]") of me end chkKanji: –ひらがなか on chkHiragana:aChar return detectCharKind(aChar, "[ぁ-ん]") of me end chkHiragana: –カタカナか on chkKatakana:aChar return detectCharKind(aChar, "[ァ-ヶ]") of me end chkKatakana: –半角スペースか on chkSpace:checkString set muCharSet to NSCharacterSet’s alloc()’s init() muCharSet’s addCharactersInString:" " –半角スペース(20h) set ret to my chkCompareString:checkString baseString:muCharSet return ret as boolean end chkSpace: — アルファベットか on chkAlphabet:checkString set aStr to NSString’s stringWithString:checkString set allCharSet to NSMutableCharacterSet’s alloc()’s init() allCharSet’s addCharactersInRange:(current application’s NSMakeRange(id of "a", 26)) allCharSet’s addCharactersInRange:(current application’s NSMakeRange(id of "A", 26)) set aBool to my chkCompareString:aStr baseString:allCharSet return aBool as boolean end chkAlphabet: on chkCompareString:checkString baseString:baseString set aScanner to NSScanner’s localizedScannerWithString:checkString aScanner’s setCharactersToBeSkipped:(missing value) aScanner’s scanCharactersFromSet:baseString intoString:(missing value) return (aScanner’s isAtEnd()) as boolean end chkCompareString:baseString: on detectCharKind(aChar, aPattern) set aChar to NSString’s stringWithString:aChar set searchStr to NSString’s stringWithString:aPattern set matchRes to aChar’s rangeOfString:searchStr options:(NSRegularExpressionSearch) if matchRes’s location() = (current application’s NSNotFound) or (matchRes’s location() as number) > 9.99999999E+8 then return false else return true end if end detectCharKind |
More from my site
(Visited 55 times, 1 visits today)