【sphinx]声学模型训练流程学习
一 训练流程逐步追踪
./script_pl/make_feats.pl -ctl test7_train.fileids //由wav文件转为mfc文件
./script_pl/RunAll.pl //开始训练
如下是RunAll.pl中的流程,我们挨个分析
("$ST::CFG_SCRIPT_DIR/00.verify/verify_all.pl", //检查各个训练文件的格式是否正常 "$ST::CFG_SCRIPT_DIR/01.lda_train/slave_lda.pl", "$ST::CFG_SCRIPT_DIR/02.mllt_train/slave_mllt.pl", "$ST::CFG_SCRIPT_DIR/05.vector_quantize/slave.VQ.pl", "$ST::CFG_SCRIPT_DIR/10.falign_ci_hmm/slave_convg.pl", "$ST::CFG_SCRIPT_DIR/11.force_align/slave_align.pl", "$ST::CFG_SCRIPT_DIR/12.vtln_align/slave_align.pl", "$ST::CFG_SCRIPT_DIR/20.ci_hmm/slave_convg.pl", "$ST::CFG_SCRIPT_DIR/30.cd_hmm_untied/slave_convg.pl", "$ST::CFG_SCRIPT_DIR/40.buildtrees/slave.treebuilder.pl", "$ST::CFG_SCRIPT_DIR/45.prunetree/slave.state-tying.pl", "$ST::CFG_SCRIPT_DIR/50.cd_hmm_tied/slave_convg.pl", "$ST::CFG_SCRIPT_DIR/60.lattice_generation/slave_genlat.pl", "$ST::CFG_SCRIPT_DIR/61.lattice_pruning/slave_prune.pl", "$ST::CFG_SCRIPT_DIR/62.lattice_conversion/slave_conv.pl", "$ST::CFG_SCRIPT_DIR/65.mmie_train/slave_convg.pl", "$ST::CFG_SCRIPT_DIR/90.deleted_interpolation/deleted_interpolation.pl",
("$ST::CFG_SCRIPT_DIR/00.verify/verify_all.pl", //检查各个训练文件的格式是否正常 ,其中各个步骤含义如下:
MODULE: 00 verify training files O.S. is case sensitive ("A" != "a"). Phones will be treated as case sensitive. Phase 1: DICT - Checking to see if the dict and filler dict agrees with the phonelist file. //检查词典中的音素是否都在phonelist中 Found 1485 words using 65 phones Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary//检查是否词典中有重复的词条,此处注意如果是多音字,也要区别写为 【曾(2) z eng】,加个(2)标记 Phase 3: CTL - Check general format; utterance length (must be positive); files exist //检查test.fileids文件是否存在 Phase 4: CTL - Checking number of lines in the transcript should match lines in control file//transcription中的行数必须和fileids中的行数是一致的 Phase 5: CTL - Determine amount of training data, see if n_tied_states seems reasonable.估计训练数据的规模,看训练n状态模型是否可行 Estimated Total Hours Training: 0.470841666666667 估计时长:47分钟 This is a small amount of data, no comment at this time 数据量有点小,此处先不做评论 Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the dictionary //检查是否transcription中所有单词都在词典中 Words in dictionary: 1482 词典中单词有1482个 Words in filler dictionary: 3 三个环境噪音:sil <s> </s> Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once //检查transcription中所有音素都在音素表里,并且至少出现了一次(至少有一个例子)
以下步骤由于sphinxtrain.cfg中的设置,这些步骤都跳过,应该在训练别的格式的模型中会使用到
MODULE: 01 Train LDA transformation
Skipped (set $CFG_LDA_MLLT = \'yes\' to enable)
MODULE: 02 Train MLLT transformation
Skipped (set $CFG_LDA_MLLT = \'yes\' to enable)
MODULE: 05 Vector Quantization
Skipped for continuous models
MODULE: 10 Training Context Independent models for forced alignment and VTLN
Skipped: $ST::CFG_FORCEDALIGN set to \'no\' in sphinx_train.cfg
Skipped: $ST::CFG_VTLN set to \'no\' in sphinx_train.cfg
MODULE: 11 Force-aligning transcripts
Skipped: $ST::CFG_FORCEDALIGN set to \'no\' in sphinx_train.cfg
MODULE: 12 Force-aligning data for VTLN
Skipped: $ST::CFG_VTLN set to \'no\' in sphinx_train.cfg
如下开始走20step,开始做迭代训练,
sphinxtrain.cfg的设置
$CFG_MIN_ITERATIONS = 1; # BW Iterate at least this many times
$CFG_MAX_ITERATIONS = 10; # BW Don\’t iterate more than this, somethings likely wrong.
是说最少迭代一次,最多迭代10 次,这个应该后续可以调整,训练收敛的条件,一般一个是达到迭代次数,一个是每次变化的阈值接近一个值,比如0.0001,发现再无更大变化,则训练就停止。一般经验是迭代次数越多,模型越精确??
MODULE: 20 Training Context Independent models //训练上下文无关的模型,也就是单个gram模型 Phase 1: Cleaning up directories: 设置一些目录 accumulator...logs...qmanager...models... Phase 2: Flat initialize 平滑初始化
mk_mdef_gen 这个步骤好像是检查phonelist中的每个phone,是否有triphone,所以phonelist中可以是填写trinphone的格式吧,以后训练时候
INFO: cmd_ln.c(691): Parsing command line:
/home/lijieqiong/sphinx/mytrain/data7/bin/mk_mdef_gen \
-phnlstfn /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.phonelist \
-ocimdef /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.ci.mdef \
-n_state_pm 3
mk_flat 定义了模型训练过程中的混合权重,转换矩阵,定义了transition_matrices [65x3x4 array]总共是65个音素,所以总共是65个模型,每个模型最多4个状态。一般是3状态。mixture_weights [195x1x1 array],每个模型3个状态,65*3=总共有195个连接状态。
data7/bin/mk_flat \
-moddeffn /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.ci.mdef \
-topo /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.topology \
-mixwfn /home/lijieqiong/sphinx/mytrain/data7/model_parameters/test.ci_cont_flatinitial/mixture_weights \
-tmatfn /home/lijieqiong/sphinx/mytrain/data7/model_parameters/test.ci_cont_flatinitial/transition_matrices \
-nstream 1 \
-ndensity 1
init_gau 根据训练文件定义的文件列表,对每个文件的feat做分析,每个feat中有13维的特征,计算一个均值估计,将结果写到bwaccumdir/test_buff_1/gauden_count
-ctlfn /home/lijieqiong/sphinx/mytrain/data7/etc/test_train.fileids \
-part 1 \
-npart 1 \
-cepdir /home/lijieqiong/sphinx/mytrain/data7/feat \
-cepext mfc \
-accumdir /home/lijieqiong/sphinx/mytrain/data7/bwaccumdir/test_buff_1 \
-agc none \
-cmn current \
-varnorm no \
-feat 1s_c_d_dd \
-ceplen 13
norm 将上个均值,做归一化,即为全局均值,写到test.ci_cont_flatinitial/globalmean
bin/norm \
-accumdir /home/lijieqiong/sphinx/mytrain/data7/bwaccumdir/test_buff_1 \
-meanfn /home/lijieqiong/sphinx/mytrain/data7/model_parameters/test.ci_cont_flatinitial/globalmean
Phase 3: Forward-Backward 前向后向
Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1) 开始训练一个高斯模型,进程从0-100
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Normalization for iteration: 1 做归一化
Current Overall Likelihood Per Frame = -7.34599977581518 当前总体每帧的似然度
Baum welch starting for 1 Gaussian(s), iteration: 2 (1 of 1) 1高斯训练的第二次迭代
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
This step had 14 ERROR messages and 0 WARNING messages. Please check the log file for details. 14个错误
Normalization for iteration: 2
Current Overall Likelihood Per Frame = -7.08191351156474
Convergence Ratio = 0.264086264250444
Baum welch starting for 1 Gaussian(s), iteration: 3 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
This step had 42 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 3
Current Overall Likelihood Per Frame = -1.99339567581038
Convergence Ratio = 5.08851783575436
Baum welch starting for 1 Gaussian(s), iteration: 4 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
This step had 70 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 4
Current Overall Likelihood Per Frame = 0.331860199053862
Convergence Ratio = 2.32525587486424
Baum welch starting for 1 Gaussian(s), iteration: 5 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
This step had 82 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 5
Current Overall Likelihood Per Frame = 1.5098667300257
Convergence Ratio = 1.17800653097184
Baum welch starting for 1 Gaussian(s), iteration: 6 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
This step had 94 ERROR messages and 1 WARNING messages. Please check the log file for details.
Normalization for iteration: 6
WARNING: This step had 0 ERROR messages and 3 WARNING messages. Please check the log file for details.
Current Overall Likelihood Per Frame = 2.41324954333952
Convergence Ratio = 0.903382813313822
Baum welch starting for 1 Gaussian(s), iteration: 7 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
This step had 100 ERROR messages and 1 WARNING messages. Please check the log file for details.
Normalization for iteration: 7
WARNING: This step had 0 ERROR messages and 3 WARNING messages. Please check the log file for details.
Current Overall Likelihood Per Frame = 3.09044610173611
Convergence Ratio = 0.677196558396586
Baum welch starting for 1 Gaussian(s), iteration: 8 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
This step had 110 ERROR messages and 1 WARNING messages. Please check the log file for details.
Normalization for iteration: 8
WARNING: This step had 0 ERROR messages and 3 WARNING messages. Please check the log file for details.
Current Overall Likelihood Per Frame = 3.71101464540463
Convergence Ratio = 0.620568543668517
Baum welch starting for 1 Gaussian(s), iteration: 9 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
This step had 114 ERROR messages and 1 WARNING messages. Please check the log file for details.
Normalization for iteration: 9
WARNING: This step had 0 ERROR messages and 3 WARNING messages. Please check the log file for details.
Current Overall Likelihood Per Frame = 4.15122418167385
Convergence Ratio = 0.440209536269223
Baum welch starting for 1 Gaussian(s), iteration: 10 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
This step had 122 ERROR messages and 1 WARNING messages. Please check the log file for details.
Normalization for iteration: 10
WARNING: This step had 0 ERROR messages and 3 WARNING messages. Please check the log file for details.
Maximum desired iterations 10 performed. Terminating CI training 迭代次数到达10次,停止CI训练
Training completed after 10 iterations
接下来是,训练上下文相关的模型,就是ngram模型
Phase 1: Cleaning up directories: 依然是清空目录
accumulator... logs... qmanager... completed
Phase 2: Initialization
mk_mdef_gen 初始化,这一步的工作看起来是整理训练文本中triphone和单词的格式,统计了有多少个triphone,有多少个单独单词
bin/mk_mdef_gen \
-phnlstfn /home/lijieqiong/sphinx/mytrain/data7/etc/test.phone \
-dictfn /home/lijieqiong/sphinx/mytrain/data7/etc/test.dic \
-fdictfn /home/lijieqiong/sphinx/mytrain/data7/etc/test.filler \
-lsnfn /home/lijieqiong/sphinx/mytrain/data7/etc/test_train.transcription \
-ountiedmdef /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.untied.mdef \
-n_state_pm
INFO: mk_mdef_gen.c(878): 65 n_base, 3316 n_tri,得到transcription中有这么多音素和triphone
init_mixw 这一步是copyCItoCD,将CI模型转换为CD模型,其中读取CI的权重方差均值,然后列出一些模型个数,变成CD的时候,个数变了,这里不懂原理
INFO: model_def_io.c(588): 65 total models defined (65 base, 0 tri)
INFO: model_def_io.c(589): 260 total states
INFO: model_def_io.c(590): 195 total tied states
如上,因为有65个音素,所以是65个状态,每个音素4个状态,所以是65*4.每个音素之间有3个转移状态,所以是65*3的连接状态
如下,是65+3316=3381.这里是单个音素的模型和triphone模型加在一起了。
INFO: model_def_io.c(588): 3381 total models defined (65 base, 3316 tri)
INFO: model_def_io.c(589): 13524 total states
INFO: model_def_io.c(590): 10143 total tied states
Phase 3: Forward-Backward 开始前后向的迭代训练
Baum welch starting for iteration: 1 (1 of 1)
Normalization for iteration: 1
Current Overall Likelihood Per Frame = 2.99562145055707
Baum welch starting for iteration: 2 (1 of 1)
Normalization for iteration: 2
norm
Current Overall Likelihood Per Frame = 10.2063900810682
Convergence Ratio = 2.40710274963821
Baum welch starting for iteration: 3 (1 of 1)
如上,BW迭代,在做归一,再BW迭代。每次是在上一次基础上的迭代。每次递增每一帧的总体似然度。此处报一个错误
utt> 59 60 1608INFO: cmn.c(175): CMN: 10.64 -0.19 -0.10 0.05 -0.30 -0.07 -0.05 -0.04 -0.08 -0.05 -0.14 -0.13 -0.09
0 288 2 ERROR: “backward.c”, line 430: Failed to align audio to trancript: final state of the search is not reached
ERROR: “baum_welch.c”, line 331: 60 ignored 第60个句子,audio和transcript无法对应上。搜索时候最后一个状态无法到达。这个问题一般是由于录音和文本不对应,录音中多了内容文本中缺乏导致。可是听了音对比文件也没发现有明显缺失。这个问题再议。。
而后,开始下一步,树的生成了。
MODULE: 40 Build Trees 建立决策树
Phase 1: Cleaning up old log files…
Phase 2: Make Questions 准备问题集
make_quests 如下,对3个状态,每个状态准备20个问题。
INFO: main.c(1108): Done building questions using state 0
INFO: main.c(1109): 20 questions from state 0
INFO: main.c(1108): Done building questions using state 1
INFO: main.c(1109): 20 questions from state 1
INFO: main.c(1108): Done building questions using state 2
INFO: main.c(1109): 21 questions from state 2
INFO: main.c(1114): Stored questions in /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.tree_questions
生成的问题集写到model参数目录中,如下格式:
QUESTION9 iong k n ui uxn w,似乎含义是,这一条问题,要回答其中的音素是否是后面这些中的一个?还是组合呢?这里不懂。
Phase 3: Tree building 开始建设决策树,这里是对每个音素的每个状态建立一个决策树
Processing each phone with each state
a 0 对a音素的第0个状态建立决策树,涉及到遍历问题集,回答问题得到值,结点的劈分,最大结点数有定义,为7.如此,应该是将这个音素存在的单词作为训练集合,其中回答各种问题,形成决策树,每一枝干都是一个问题的劈分。有未剪枝和剪枝之分。
completed
a 1
completed
a 2
completed
ai 0
completed
ai 1
completed
ai 2
接下来对决策树做剪枝
MODULE: 45 Prune Trees (2015-09-23 15:07)
mk_mdef_gen
Phase 1: Tree Pruning 首先,对树做剪枝
prunetree
bin/prunetree \
-itreedir /home/lijieqiong/sphinx/mytrain/data7/trees/test.unpruned \
-nseno 1000 \
-otreedir /home/lijieqiong/sphinx/mytrain/data7/trees/test.1000 \
-moddeffn /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.alltriphones.mdef \
-psetfn /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.tree_questions \
-minocc 0 将每个音素的结点,缩减,达到剪枝的效果。
Phase 2: State Tying 其次,状态连接起来
tiestate 这一步好像是统计了每个音素的偏移量
bin/tiestate \
-imoddeffn /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.alltriphones.mdef \
-omoddeffn /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.1000.mdef \
-treedir /home/lijieqiong/sphinx/mytrain/data7/trees/test.1000 \
-psetfn /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.tree_questions
然后开始训练独立于上下文的模型。
Phase 1: Cleaning up directories:
accumulator... logs... qmanager... completed
Phase 2: Copy CI to CD initialize
Phase 3: Forward-Backward 前向后向训练,BW迭代,似乎是迭代了7次,然后高斯劈分为2高斯,再迭代7次,在劈分为4个高斯,再迭代
Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1)
Normalization for iteration: 1
Current Overall Likelihood Per Frame = 2.99562145055707
。。。。。。。。。。。。。。。。。。。。。。
到最终,宣布迭代完毕,训练完毕。
Current Overall Likelihood Per Frame = 17.2036627024291
Convergence ratio = 0.0693319838056503
Likelihoods have converged! Baum Welch training completed!
******************************TRAINING COMPLETE*************************
自此,模型训练完毕,如下流程,在初始设置中是跳过的。
MODULE: 60 Lattice Generation (2015-09-23 15:10)
Skipped: $ST::CFG_MMIE set to \’no\’ in sphinx_train.cfg
MODULE: 61 Lattice Pruning (2015-09-23 15:10)
Skipped: $ST::CFG_MMIE set to \’no\’ in sphinx_train.cfg
MODULE: 62 Lattice Format Conversion (2015-09-23 15:10)
Skipped: $ST::CFG_MMIE set to \’no\’ in sphinx_train.cfg
MODULE: 65 MMIE Training (2015-09-23 15:10)
Skipped: $ST::CFG_MMIE set to \’no\’ in sphinx_train.cfg
MODULE: 90 deleted interpolation (2015-09-23 15:10)
Skipped for continuous models