Safariの最前面のウィンドウで表示中のページのうち、現在ウィンドウ内に表示中の表要素をCSV書き出ししてNumbersでオープンするAppleScriptです。
このところ下調べを行なっていた「Webブラウザで表示中の要素を処理する」「表示中ではない要素は処理をしない」というScriptです。
これで、「表の一部を選択しておく」とかいった操作は不要になりました。ウィンドウ内に表示されている表をWebコンテンツ内部の表示座標をもとに自動抽出します。表示エリア外に位置しているものは書き出し処理しません。
各DOM ElementsのWebコンテンツ中の表示座標を取得して、絞り込みを行なっています。ただし、各DOM座標はWebブラウザのスクロールにしたがって数値が変わる(相対座標)ため、少々手こずりました。また、本Scriptでは上下スクロールのみ考慮してDOM要素の抽出を行なっており、横に長いページの横方向スクロールは考慮しておりません。
本Scriptは大量一括処理を志向するプログラムではなく、「見えているもの」をそのまま処理してほしいという考えで作ったものでもあり、Webブラウザ(Safari)で表示中のページのソースを取得してそのまま処理しています。つまり、ユーザーが閲覧中のページのデータそのものを処理しています。
これは、ページのソースを取得するコマンドを持っていないGoogle Chromeにはできない処理です(同じURLの内容を別途curlコマンドなどで取得すればOK。Cookie値などの再現が大変でしょうけれども)。
その他、実際に作って使ってみた感想は、装飾用に使われている表データまで取り込んでしまう点に不満があるぐらいでしょうか。これら「ゴミデータ」(再利用する価値のない装飾用の表データ)を区別するために、行数が足りない場合には書き出さないといった「足切り」を行う必要性を感じます。
–> Download VisibleTableExporter(Code-signed executable applet with Framework in its bundle)
AppleScript名:Safariで現在見えている表を抽出してCSV書き出し.scptd |
— – Created by: Takaaki Naganoya – Created on: 2019/09/22 — – Copyright © 2019 Piyomaru Software, All Rights Reserved — use AppleScript version "2.4" use scripting additions use framework "Foundation" use framework "HTMLReader" –https://github.com/nolanw/HTMLReader property NSUUID : a reference to current application’s NSUUID property NSString : a reference to current application’s NSString property HTMLDocument : a reference to current application’s HTMLDocument property NSMutableArray : a reference to current application’s NSMutableArray property NSJSONSerialization : a reference to current application’s NSJSONSerialization set aTag to "table" set indRes to getVisibleElementIndexList(aTag) of me if indRes = false or indRes = {} then display notification "No Visible Table in Web browser" return end if tell application "Safari" tell front document set aSource to source end tell end tell repeat with i in indRes set inList to filterATableAndPaseCells(aSource, i, aTag) of me if inList = false or inList = {} then return set aUUID to current application’s NSUUID’s UUID()’s UUIDString() as text set aNewFile to ((path to desktop) as string) & aUUID & ".csv" saveAsCSV(inList, aNewFile) of me tell application "Numbers" open (aNewFile as alias) end tell end repeat tell application "Numbers" to activate on filterATableAndPaseCells(aSource as string, targInd as integer, aTag as string) set aHTML to current application’s HTMLDocument’s documentWithString:(aSource as string) –Table要素をリストアップ set eList to (aHTML’s nodesMatchingSelector:aTag) as list set aObj to contents of item (targInd + 1) of eList –Count columns of Table Header set aTableHeader to (aObj’s nodesMatchingSelector:"tr")’s firstObject() set hList to aTableHeader’s nodesMatchingSelector:"th" set hStrList to {} repeat with i1 in hList set the end of hStrList to i1’s textContent() as string end repeat set hLen to length of hStrList –count columns –Acquire whole table body contents set aTableBody to (aObj’s nodesMatchingSelector:"tbody")’s firstObject() set bList to aTableBody’s nodesMatchingSelector:"td" set bbList to {} repeat with i2 in bList set the end of bbList to i2’s textContent() as string end repeat set tbList to makeList1DTo2D(bbList, hLen) of me return {hStrList} & tbList end filterATableAndPaseCells –1D Listを2D化 on makeList1DTo2D(orig1DList as list, aMax) set tbList to {} set tmpList to {} set aCount to 1 repeat with i3 in orig1DList set j to contents of i3 set the end of tmpList to j if aCount ≥ aMax then set aCount to 1 set the end of tbList to tmpList set tmpList to {} else set aCount to aCount + 1 end if end repeat return tbList end makeList1DTo2D –Safariのウィンドウ上で表示中のDOM Elementsを座標計算して返す on getVisibleElementIndexList(aTag as string) tell application "Safari" set dCount to count every document if dCount = 0 then return false set jRes to do JavaScript "var winWidth = window.innerWidth, winHeight = window.innerHeight, winLeft = window.scrollX winTop = window.scrollY, winBottom = winTop + winHeight, winRight = winLeft + winWidth, elementsArray = document.body.getElementsByTagName(’" & aTag & "’), elemLen = elementsArray.length, inView = []; var step; for (step = 0 ; step < elemLen ; step++) { var tmpElem = document.body.getElementsByTagName(’" & aTag & "’)[step]; var bVar = tmpElem.getBoundingClientRect(); if (bVar.top > 0 && bVar.top < winHeight) { inView.push(step); } } JSON.stringify(inView);" in front document set jList to parseJSONAsList(jRes) of me return jList end tell end getVisibleElementIndexList on parseJSONAsList(jsRes as string) set jsonString to NSString’s stringWithString:jsRes set jsonData to jsonString’s dataUsingEncoding:(current application’s NSUTF8StringEncoding) set aJsonDict to NSJSONSerialization’s JSONObjectWithData:jsonData options:0 |error|:(missing value) return aJsonDict as list end parseJSONAsList –Save 2D List to CSV file on saveAsCSV(aList as list, aPath) set crlfChar to (string id 13) & (string id 10) set LF to (string id 10) set wholeText to "" repeat with i in aList set newLine to {} –Sanitize (Double Quote) repeat with ii in i set jj to ii as text set kk to repChar(jj, string id 34, (string id 34) & (string id 34)) of me –Escape Double Quote set the end of newLine to kk end repeat –Change Delimiter set aLineText to "" set curDelim to AppleScript’s text item delimiters set AppleScript’s text item delimiters to "\",\"" set aLineList to newLine as text set AppleScript’s text item delimiters to curDelim set aLineText to repChar(aLineList, return, "") of me –delete return set aLineText to repChar(aLineText, LF, "") of me –delete lf set wholeText to wholeText & "\"" & aLineText & "\"" & crlfChar –line terminator: CR+LF end repeat if (aPath as string) does not end with ".csv" then set bPath to aPath & ".csv" as Unicode text else set bPath to aPath as Unicode text end if writeToFileAsUTF8(wholeText, bPath, false) of me end saveAsCSV on writeToFileAsUTF8(this_data, target_file, append_data) tell current application try set the target_file to the target_file as text set the open_target_file to open for access file target_file with write permission if append_data is false then set eof of the open_target_file to 0 write this_data as «class utf8» to the open_target_file starting at eof close access the open_target_file return true on error error_message try close access file target_file end try return error_message end try end tell end writeToFileAsUTF8 on repChar(origText as text, targChar as text, repChar as text) set curDelim to AppleScript’s text item delimiters set AppleScript’s text item delimiters to targChar set tmpList to text items of origText set AppleScript’s text item delimiters to repChar set retText to tmpList as string set AppleScript’s text item delimiters to curDelim return retText end repChar |