単語学習アップデート - Kindleのハイライトから英単語学習問題集を作成する

はじめに

ここでは任意のKindle書籍上の英単語のハイライトから、英辞郎を使ってP-Study System用の問題集を作り出す方法を紹介したい。

事前準備と環境

ここから先を実行するには以下の環境が必要だ

PC (Windows)
PC用英辞郎辞書（CDROM版）
Python3.x
P-Study System
PC用Kindleと対象電子書籍

英辞郎に関してはPDIC及び英辞郎辞書がPCにインストールされている必要がある。現在の最新版は第九版でDVD版だが私の手元には古い第五版しかないのでそれを使用した。後半スクリプトの実行にPython3の実行環境が、問題集の実行にはP-Study Systemが必要だ。またハイライトの抽出にPC用Kindleソフトウェアを用いる。
www.takke.jp
また一連の操作はWindowsを仮定している。対象になるキンドル書籍も必要だ。今回はたまたま手元にあったスティーブンキングの"It"を使ってみた。覚えたい単語は事前にハイライトしてあるものとする。

テストの元になる辞書テキストを用意する

次に、英辞郎の内容をPDICからテキスト形式で書き出しておく。
方法についてはP-Study Systemの作者の方がまとめてくださっているので、これに従う。ただし、LVの指定はせずに全レベル書き出す。ファイル名は任意だが、ここではeijiro.txtとしておいた。上手く動作しない場合は検索対象辞書->EIJI-128.docのみ選択、でうまく行く事があるようだ
d.hatena.ne.jp

ハイライトのエキスポート

次にキンドルのハイライトをPCのアプリからエキスポートする。
f:id:uzusayuu:20170825001652p:plain
画面左のノート表示を選択し、その後エキスポートアイコンをクリックし、任意の名前でセーブしておく。この例は適当に例として単語をハイライトしたものだが、他の書籍でも同じことができるはずだ。ここではファイル名はIt-Notebook.htmlとした。

問題集作成

このハイライトの内容と英辞郎をリンクするのに次のようなスクリプトを用意した
read_wordlist.py (8/26/2017更新)

#coding: utf-8
import codecs, sys

# Constants
removelist = [u'◆', u'\ ・', u'【変化】', u'【分節】', u'【＠】']

def FindDefinitions(word, dic, startpoint=0):
    deflist = []
    found = False

    # find the first element that meets the condition. Don't want to evaluate all elements, so starting from j
    i = startpoint
    j = (i-1+len(dic)) % len(dic)
    while(i!=j and (not dic[i].startswith(word+' ///'))):
        i = (i+1) % len(dic)
    if i!=j: # found
        l = dic[i][len(word)+4:]
        for r in removelist:
            ii = l.find(r)
            l = l[:ii] if (ii >= 0) else l
        for d in l.split('\\'):
            if 0 <= d.find(u'＝<→') < d.find(u'>'):
                dl, w, k = FindDefinitions(d[d.find(u'＝<→') + 3:d.find(u'>')], dic) # w and k are not used
                deflist.extend(dl)
            elif (len(d.strip())>0):
                deflist.append(d.strip())
    else: # word not found
        if (word[-1]=='s'):
            deflist, w, k = FindDefinitions(word[:-1], dic)
            if (len(deflist)>0):
                print(word+' not found. '+w+' is used instead.')
                word = w
    j = (i+1) % len(dic) # next starting point
    return deflist, word, j

args = sys.argv
inputfilename = "wordlist.html" if len(args)<2 else args[1]
try:
    f = codecs.open(inputfilename, "r", 'utf-8')
    txt = f.readlines()
except:
    print("Error opening "+inputfilename)
    raise

dicfilename = "eijiro.txt" if len(args)<4 else args[3]
try:
    ef = codecs.open(dicfilename, 'r', 'utf-16')
    dic = ef.readlines()
except:
    print("Error opening "+dicfilename)
    raise

title = "quiz" if len(args)<3 else args[2]

with codecs.open(title+'.csv', 'w', 'sjis') as fout:
    fout.write("psscsvfile,100,,,\n")
    fout.write(title+",,,,\n")
    fout.write(",,,,\n")
    fout.write("a1,q1,q2,q3,q4,q5,q6,q7,q8,q9,q10,\n")

    j = 0
    for word in txt:
        ii = word.find(u'<div class=\'noteText\'>')
        if ii<0:
            continue
        word = word[ii+22:]
        ii = word.find(u'</div>')
        if ii<0:
            continue
        word = word[:ii].rstrip(',').rstrip('.').rstrip()
        defs, word, j = FindDefinitions(word, dic, j)
        if len(defs)==0:
            print(word+' not found')
            continue
        outtxt = "\""+word.strip()+"\""
        for d in defs:
            outtxt = outtxt + ", \""+d.strip()+"\""
        try:
            fout.write(outtxt.encode('CP932').decode('SJIS') + "\n")
        except(UnicodeEncodeError):
            # print('UnicodeEncodeError: '+outtxt+"\n")
            i1 = outtxt.find('[')
            i2 = outtxt.find(']')
            if (0<=i1<i2):
                outtxt = outtxt[:i1]+outtxt[i2+1:]
                fout.write(outtxt.encode('CP932').decode('SJIS') + "\n")
            else:
                print('Error printing '+outtxt+'\n')

python3へのパスが通ったフォルダーに、このスクリプト、辞書ファイル、エクスポートしたハイライトのファイルをすべて集めて、このスクリプトを実行する。

python read_wordlist.py <エクスポートしたハイライト> <任意のタイトル> <辞書ファイル>

たとえば、

python read_wordlist.py It-Notebook.html It-words eijiro.txt

などと入力すれば、It-words.csvという名前で問題集が生成されるはずだ。もし見つからない単語がある場合はコンソールにその旨出力されるので、どうしても必要な場合は手動で追加することもできる。

P-Study Systemへのインポート

後は、P-Study SystemのPssEditor（問題エディタ）を開き、問題集のインポートから先ほど生成されたcsvファイルをインポートすれば問題集が追加される。
f:id:uzusayuu:20170824164314p:plain
あとはP-Study Systemの方からこの問題を選択すれば、Kindleでハイライトした単語をもとにした問題集で勉強ができる！

まとめ

同じ方法で任意のキンドルの書籍の英単語ハイライトから問題集を作成することができるはずですが、英辞郎、P-Study System、及びキンドル電子書籍の著作権等を侵さないよう、各ソフトの規約を守ってお試し下さい。

また上記の方法はブログ主の環境で実行できた内容を紹介しただけであり、他の環境で実行できることを保証する物ではなく、また上記の内容を試したいかなる結果に関しても責任を負いません。

同様に上記のスクリプトは筆者の環境でのみテストを行っており、他の環境での動作は保証されません。後ほどGithubにでも上げておく予定なので、もし不具合など見つけたかたがいらっしゃった場合はpull requestなどしていただけると感謝します。
8/26/2017 追記
GitHub にリポジトリー作成済み:
github.com