AppleScript名:PDFから本文テキストを抽出して配列にストアして文字列検索 |
— Created 2017-06-18 by Takaaki Naganoya — 2017 Piyomaru Software use AppleScript version "2.4" use scripting additions use framework "Foundation" use framework "Quartz" property textCache : missing value property aList : {} –検索対象の語群 set sList to {"notification", "Cocoa"} –considering case set thePath to POSIX path of (choose file of type {"com.adobe.pdf"}) –PDFのテキスト内容をあらかじめページごとに読み取って、検索用のテキストキャッシュを作成 set anNSURL to (current application’s |NSURL|’s fileURLWithPath:thePath) set theDoc to current application’s PDFDocument’s alloc()’s initWithURL:anNSURL set theCount to theDoc’s pageCount() as integer set textCache to current application’s NSMutableArray’s new() repeat with i from 0 to (theCount – 1) set aPage to (theDoc’s pageAtIndex:i) set tmpStr to (aPage’s |string|()) (textCache’s addObject:{pageIndex:i + 1, pageString:tmpStr}) end repeat –主にテキストキャッシュを対象にキーワード検索 repeat with s in sList –❶部分一致で抽出 set bRes to ((my filterRecListByLabel1(textCache, "pageString contains ’" & s & "’"))’s pageIndex) as list –❷、❶のページ単位のテキスト検索で見つからなかった場合(ページ間でまたがっている場合など) if bRes = {} then set bRes to {} set theSels to (theDoc’s findString:s withOptions:0) repeat with aSel in theSels set thePage to (aSel’s pages()’s objectAtIndex:0)’s label() set curPage to (thePage as integer) if curPage is not in bRes then set the end of bRes to curPage end if end repeat end if set the end of aList to bRes end repeat return aList –リストに入れたレコードを、指定の属性ラベルの値で抽出 on filterRecListByLabel1(aRecList as list, aPredicate as string) set aArray to current application’s NSArray’s arrayWithArray:aRecList set aPredicate to current application’s NSPredicate’s predicateWithFormat:aPredicate set filteredArray to aArray’s filteredArrayUsingPredicate:aPredicate return filteredArray end filterRecListByLabel1 |
More from my site
(Visited 43 times, 1 visits today)