【sphinx]声学模型训练流程学习

一训练流程逐步追踪

./script_pl/make_feats.pl -ctl test7_train.fileids //由wav文件转为mfc文件

./script_pl/RunAll.pl //开始训练

如下是RunAll.pl中的流程，我们挨个分析

("$ST::CFG_SCRIPT_DIR/00.verify/verify_all.pl",    //检查各个训练文件的格式是否正常   
     "$ST::CFG_SCRIPT_DIR/01.lda_train/slave_lda.pl",
     "$ST::CFG_SCRIPT_DIR/02.mllt_train/slave_mllt.pl",
     "$ST::CFG_SCRIPT_DIR/05.vector_quantize/slave.VQ.pl",
     "$ST::CFG_SCRIPT_DIR/10.falign_ci_hmm/slave_convg.pl",
     "$ST::CFG_SCRIPT_DIR/11.force_align/slave_align.pl",
     "$ST::CFG_SCRIPT_DIR/12.vtln_align/slave_align.pl",
     "$ST::CFG_SCRIPT_DIR/20.ci_hmm/slave_convg.pl",
     "$ST::CFG_SCRIPT_DIR/30.cd_hmm_untied/slave_convg.pl",
     "$ST::CFG_SCRIPT_DIR/40.buildtrees/slave.treebuilder.pl",
     "$ST::CFG_SCRIPT_DIR/45.prunetree/slave.state-tying.pl",
     "$ST::CFG_SCRIPT_DIR/50.cd_hmm_tied/slave_convg.pl",
     "$ST::CFG_SCRIPT_DIR/60.lattice_generation/slave_genlat.pl",
     "$ST::CFG_SCRIPT_DIR/61.lattice_pruning/slave_prune.pl",
     "$ST::CFG_SCRIPT_DIR/62.lattice_conversion/slave_conv.pl",
     "$ST::CFG_SCRIPT_DIR/65.mmie_train/slave_convg.pl",
     "$ST::CFG_SCRIPT_DIR/90.deleted_interpolation/deleted_interpolation.pl",

("$ST::CFG_SCRIPT_DIR/00.verify/verify_all.pl",    //检查各个训练文件的格式是否正常 ，其中各个步骤含义如下：

MODULE: 00 verify training files
O.S. is case sensitive ("A" != "a").
Phones will be treated as case sensitive.
    Phase 1: DICT - Checking to see if the dict and filler dict agrees with the phonelist file. //检查词典中的音素是否都在phonelist中
        Found 1485 words using 65 phones
    Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary//检查是否词典中有重复的词条，此处注意如果是多音字，也要区别写为 【曾(2) z eng】，加个（2）标记
    Phase 3: CTL - Check general format; utterance length (must be positive); files exist //检查test.fileids文件是否存在
    Phase 4: CTL - Checking number of lines in the transcript should match lines in control file//transcription中的行数必须和fileids中的行数是一致的
    Phase 5: CTL - Determine amount of training data, see if n_tied_states seems reasonable.估计训练数据的规模，看训练n状态模型是否可行
        Estimated Total Hours Training: 0.470841666666667  估计时长：47分钟
        This is a small amount of data, no comment at this time  数据量有点小，此处先不做评论
    Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the dictionary //检查是否transcription中所有单词都在词典中
        Words in dictionary: 1482   词典中单词有1482个
        Words in filler dictionary: 3   三个环境噪音：sil <s>  </s>
    Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once   //检查transcription中所有音素都在音素表里，并且至少出现了一次（至少有一个例子）

以下步骤由于sphinxtrain.cfg中的设置，这些步骤都跳过，应该在训练别的格式的模型中会使用到

MODULE: 01 Train LDA transformation
Skipped (set $CFG_LDA_MLLT = \'yes\' to enable)
MODULE: 02 Train MLLT transformation
Skipped (set $CFG_LDA_MLLT = \'yes\' to enable)
MODULE: 05 Vector Quantization
Skipped for continuous models
MODULE: 10 Training Context Independent models for forced alignment and VTLN
Skipped:  $ST::CFG_FORCEDALIGN set to \'no\' in sphinx_train.cfg
Skipped:  $ST::CFG_VTLN set to \'no\' in sphinx_train.cfg
MODULE: 11 Force-aligning transcripts
Skipped:  $ST::CFG_FORCEDALIGN set to \'no\' in sphinx_train.cfg
MODULE: 12 Force-aligning data for VTLN
Skipped:  $ST::CFG_VTLN set to \'no\' in sphinx_train.cfg

如下开始走20step，开始做迭代训练，

sphinxtrain.cfg的设置

$CFG_MIN_ITERATIONS = 1; # BW Iterate at least this many times
$CFG_MAX_ITERATIONS = 10; # BW Don\’t iterate more than this, somethings likely wrong.

是说最少迭代一次，最多迭代10 次，这个应该后续可以调整，训练收敛的条件，一般一个是达到迭代次数，一个是每次变化的阈值接近一个值，比如0.0001，发现再无更大变化，则训练就停止。一般经验是迭代次数越多，模型越精确？？

MODULE: 20 Training Context Independent models   //训练上下文无关的模型，也就是单个gram模型
    Phase 1: Cleaning up directories:  设置一些目录
    accumulator...logs...qmanager...models...
    Phase 2: Flat initialize   平滑初始化
mk_mdef_gen 这个步骤好像是检查phonelist中的每个phone，是否有triphone，所以phonelist中可以是填写trinphone的格式吧，以后训练时候

INFO: cmd_ln.c(691): Parsing command line:
/home/lijieqiong/sphinx/mytrain/data7/bin/mk_mdef_gen \
-phnlstfn /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.phonelist \
-ocimdef /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.ci.mdef \
-n_state_pm 3

mk_flat 定义了模型训练过程中的混合权重，转换矩阵，定义了transition_matrices [65x3x4 array]总共是65个音素，所以总共是65个模型，每个模型最多4个状态。一般是3状态。mixture_weights [195x1x1 array]，每个模型3个状态，65*3=总共有195个连接状态。

data7/bin/mk_flat \
-moddeffn /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.ci.mdef \
-topo /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.topology \
-mixwfn /home/lijieqiong/sphinx/mytrain/data7/model_parameters/test.ci_cont_flatinitial/mixture_weights \
-tmatfn /home/lijieqiong/sphinx/mytrain/data7/model_parameters/test.ci_cont_flatinitial/transition_matrices \
-nstream 1 \
-ndensity 1

init_gau 根据训练文件定义的文件列表，对每个文件的feat做分析，每个feat中有13维的特征，计算一个均值估计，将结果写到bwaccumdir/test_buff_1/gauden_count

bin/init_gau \

-ctlfn /home/lijieqiong/sphinx/mytrain/data7/etc/test_train.fileids \
-part 1 \
-npart 1 \
-cepdir /home/lijieqiong/sphinx/mytrain/data7/feat \
-cepext mfc \
-accumdir /home/lijieqiong/sphinx/mytrain/data7/bwaccumdir/test_buff_1 \
-agc none \
-cmn current \
-varnorm no \
-feat 1s_c_d_dd \
-ceplen 13

norm 将上个均值，做归一化，即为全局均值，写到test.ci_cont_flatinitial/globalmean

bin/norm \
-accumdir /home/lijieqiong/sphinx/mytrain/data7/bwaccumdir/test_buff_1 \
-meanfn /home/lijieqiong/sphinx/mytrain/data7/model_parameters/test.ci_cont_flatinitial/globalmean

init_gau 根据上文的全局均值，初始化方差

norm 归一化得到全局方差

cp_parm 65*3，得到一个均值和方差

cp_parm

Phase 3: Forward-Backward 前向后向
Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1) 开始训练一个高斯模型，进程从0-100
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Normalization for iteration: 1 做归一化
Current Overall Likelihood Per Frame = -7.34599977581518 当前总体每帧的似然度
Baum welch starting for 1 Gaussian(s), iteration: 2 (1 of 1) 1高斯训练的第二次迭代
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
This step had 14 ERROR messages and 0 WARNING messages. Please check the log file for details. 14个错误
Normalization for iteration: 2
Current Overall Likelihood Per Frame = -7.08191351156474
Convergence Ratio = 0.264086264250444
Baum welch starting for 1 Gaussian(s), iteration: 3 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
This step had 42 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 3
Current Overall Likelihood Per Frame = -1.99339567581038
Convergence Ratio = 5.08851783575436
Baum welch starting for 1 Gaussian(s), iteration: 4 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
This step had 70 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 4
Current Overall Likelihood Per Frame = 0.331860199053862
Convergence Ratio = 2.32525587486424
Baum welch starting for 1 Gaussian(s), iteration: 5 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
This step had 82 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 5
Current Overall Likelihood Per Frame = 1.5098667300257
Convergence Ratio = 1.17800653097184
Baum welch starting for 1 Gaussian(s), iteration: 6 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
This step had 94 ERROR messages and 1 WARNING messages. Please check the log file for details.
Normalization for iteration: 6
WARNING: This step had 0 ERROR messages and 3 WARNING messages. Please check the log file for details.
Current Overall Likelihood Per Frame = 2.41324954333952
Convergence Ratio = 0.903382813313822
Baum welch starting for 1 Gaussian(s), iteration: 7 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
This step had 100 ERROR messages and 1 WARNING messages. Please check the log file for details.
Normalization for iteration: 7
WARNING: This step had 0 ERROR messages and 3 WARNING messages. Please check the log file for details.
Current Overall Likelihood Per Frame = 3.09044610173611
Convergence Ratio = 0.677196558396586
Baum welch starting for 1 Gaussian(s), iteration: 8 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
This step had 110 ERROR messages and 1 WARNING messages. Please check the log file for details.
Normalization for iteration: 8
WARNING: This step had 0 ERROR messages and 3 WARNING messages. Please check the log file for details.
Current Overall Likelihood Per Frame = 3.71101464540463
Convergence Ratio = 0.620568543668517
Baum welch starting for 1 Gaussian(s), iteration: 9 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
This step had 114 ERROR messages and 1 WARNING messages. Please check the log file for details.
Normalization for iteration: 9
WARNING: This step had 0 ERROR messages and 3 WARNING messages. Please check the log file for details.
Current Overall Likelihood Per Frame = 4.15122418167385
Convergence Ratio = 0.440209536269223
Baum welch starting for 1 Gaussian(s), iteration: 10 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
This step had 122 ERROR messages and 1 WARNING messages. Please check the log file for details.
Normalization for iteration: 10
WARNING: This step had 0 ERROR messages and 3 WARNING messages. Please check the log file for details.
Maximum desired iterations 10 performed. Terminating CI training 迭代次数到达10次，停止CI训练
Training completed after 10 iterations

接下来是，训练上下文相关的模型，就是ngram模型

Phase 1: Cleaning up directories:   依然是清空目录
    accumulator... logs... qmanager...  completed 
Phase 2: Initialization

mk_mdef_gen 初始化，这一步的工作看起来是整理训练文本中triphone和单词的格式，统计了有多少个triphone，有多少个单独单词

bin/mk_mdef_gen \
-phnlstfn /home/lijieqiong/sphinx/mytrain/data7/etc/test.phone \
-dictfn /home/lijieqiong/sphinx/mytrain/data7/etc/test.dic \
-fdictfn /home/lijieqiong/sphinx/mytrain/data7/etc/test.filler \
-lsnfn /home/lijieqiong/sphinx/mytrain/data7/etc/test_train.transcription \
-ountiedmdef /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.untied.mdef \
-n_state_pm

INFO: mk_mdef_gen.c(878): 65 n_base, 3316 n_tri，得到transcription中有这么多音素和triphone

init_mixw  这一步是copyCItoCD，将CI模型转换为CD模型，其中读取CI的权重方差均值，然后列出一些模型个数，变成CD的时候，个数变了，这里不懂原理

INFO: model_def_io.c(588): 65 total models defined (65 base, 0 tri)
INFO: model_def_io.c(589): 260 total states
INFO: model_def_io.c(590): 195 total tied states

如上，因为有65个音素，所以是65个状态，每个音素4个状态，所以是65*4.每个音素之间有3个转移状态，所以是65*3的连接状态
如下，是65+3316=3381.这里是单个音素的模型和triphone模型加在一起了。

INFO: model_def_io.c(588): 3381 total models defined (65 base, 3316 tri)
INFO: model_def_io.c(589): 13524 total states
INFO: model_def_io.c(590): 10143 total tied states

Phase 3: Forward-Backward 开始前后向的迭代训练

Baum welch starting for iteration: 1 (1 of 1)

Normalization for iteration: 1

norm

Current Overall Likelihood Per Frame = 2.99562145055707

Baum welch starting for iteration: 2 (1 of 1)

Normalization for iteration: 2

norm

Current Overall Likelihood Per Frame = 10.2063900810682

Convergence Ratio = 2.40710274963821

Baum welch starting for iteration: 3 (1 of 1)

如上，BW迭代，在做归一，再BW迭代。每次是在上一次基础上的迭代。每次递增每一帧的总体似然度。此处报一个错误

utt> 59 60 1608INFO: cmn.c(175): CMN: 10.64 -0.19 -0.10 0.05 -0.30 -0.07 -0.05 -0.04 -0.08 -0.05 -0.14 -0.13 -0.09
0 288 2 ERROR: “backward.c”, line 430: Failed to align audio to trancript: final state of the search is not reached
ERROR: “baum_welch.c”, line 331: 60 ignored 第60个句子，audio和transcript无法对应上。搜索时候最后一个状态无法到达。这个问题一般是由于录音和文本不对应，录音中多了内容文本中缺乏导致。可是听了音对比文件也没发现有明显缺失。这个问题再议。。

而后，开始下一步，树的生成了。

MODULE: 40 Build Trees  建立决策树

Phase 1: Cleaning up old log files…

Phase 2: Make Questions 准备问题集

make_quests 如下，对3个状态，每个状态准备20个问题。

INFO: main.c(1108): Done building questions using state 0
INFO: main.c(1109): 20 questions from state 0
INFO: main.c(1108): Done building questions using state 1
INFO: main.c(1109): 20 questions from state 1
INFO: main.c(1108): Done building questions using state 2
INFO: main.c(1109): 21 questions from state 2
INFO: main.c(1114): Stored questions in /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.tree_questions

生成的问题集写到model参数目录中，如下格式：

QUESTION9 iong k n ui uxn w，似乎含义是，这一条问题，要回答其中的音素是否是后面这些中的一个？还是组合呢？这里不懂。

Phase 3: Tree building 开始建设决策树，这里是对每个音素的每个状态建立一个决策树

Processing each phone with each state

a 0 对a音素的第0个状态建立决策树，涉及到遍历问题集，回答问题得到值，结点的劈分，最大结点数有定义，为7.如此，应该是将这个音素存在的单词作为训练集合，其中回答各种问题，形成决策树，每一枝干都是一个问题的劈分。有未剪枝和剪枝之分。

bldtree

completed

a 1

bldtree

completed

a 2

bldtree

completed

ai 0

bldtree

completed

ai 1

bldtree

completed

ai 2

接下来对决策树做剪枝

MODULE: 45 Prune Trees (2015-09-23 15:07)

mk_mdef_gen 
Phase 1: Tree Pruning  首先，对树做剪枝
prunetree

bin/prunetree \
-itreedir /home/lijieqiong/sphinx/mytrain/data7/trees/test.unpruned \
-nseno 1000 \
-otreedir /home/lijieqiong/sphinx/mytrain/data7/trees/test.1000 \
-moddeffn /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.alltriphones.mdef \
-psetfn /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.tree_questions \
-minocc 0 将每个音素的结点，缩减，达到剪枝的效果。


Phase 2: State Tying   其次，状态连接起来
tiestate  这一步好像是统计了每个音素的偏移量

bin/tiestate \
-imoddeffn /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.alltriphones.mdef \
-omoddeffn /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.1000.mdef \
-treedir /home/lijieqiong/sphinx/mytrain/data7/trees/test.1000 \
-psetfn /home/lijieqiong/sphinx/mytrain/data7/model_architecture/test.tree_questions

然后开始训练独立于上下文的模型。

Phase 1: Cleaning up directories:
    accumulator... logs... qmanager...  completed 
Phase 2: Copy CI to CD initialize
Phase 3: Forward-Backward   前向后向训练，BW迭代，似乎是迭代了7次，然后高斯劈分为2高斯，再迭代7次，在劈分为4个高斯，再迭代
Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1)
Normalization for iteration: 1
Current Overall Likelihood Per Frame = 2.99562145055707
。。。。。。。。。。。。。。。。。。。。。。
到最终，宣布迭代完毕，训练完毕。

Current Overall Likelihood Per Frame = 17.2036627024291

Convergence ratio = 0.0693319838056503
Likelihoods have converged! Baum Welch training completed!
******************************TRAINING COMPLETE*************************

自此，模型训练完毕，如下流程，在初始设置中是跳过的。

MODULE: 60 Lattice Generation (2015-09-23 15:10)

Skipped: $ST::CFG_MMIE set to \’no\’ in sphinx_train.cfg

MODULE: 61 Lattice Pruning (2015-09-23 15:10)

Skipped: $ST::CFG_MMIE set to \’no\’ in sphinx_train.cfg

MODULE: 62 Lattice Format Conversion (2015-09-23 15:10)

Skipped: $ST::CFG_MMIE set to \’no\’ in sphinx_train.cfg

MODULE: 65 MMIE Training (2015-09-23 15:10)

Skipped: $ST::CFG_MMIE set to \’no\’ in sphinx_train.cfg

MODULE: 90 deleted interpolation (2015-09-23 15:10)

Skipped for continuous models

本文链接：https://www.cnblogs.com/lijieqiong/p/4833258.html

【sphinx]声学模型训练流程学习

一训练流程逐步追踪

MODULE: 60 Lattice Generation (2015-09-23 15:10)

MODULE: 61 Lattice Pruning (2015-09-23 15:10)

MODULE: 62 Lattice Format Conversion (2015-09-23 15:10)

MODULE: 65 MMIE Training (2015-09-23 15:10)

MODULE: 90 deleted interpolation (2015-09-23 15:10)