Processing corpus
Extract src, mt and time from the primitive and put them in three files respectively.
Sentences in src and mt need to be unmarked
Time needs to be normalized. Normalization method: divide each time by the maximum value in time (the normalized value with softmax is too small)
The formed src and mt are in order. Next, we need to disrupt the order of the corpus
last
The files required for the model are:
Training set: train src train.mt train.mt.hter (must be normalized!)
Test set: test src test.mt test.mt.hter (must be normalized!)
Verification set: dev.src , dev.mt , dev.mt.hter(dev can be the same as test)
Modify file parameters
qe_train.sh:
Server block
export CUDA_VISIBLE_DEVICES=3
Data path: this path contains the training set, test set and verification set
datadir=./data/qe/error-corpus
Thesaurus path: this path does not need to be changed. The thesaurus of qe model here still uses the previous normal qe thesaurus, but changes the training corpus and standard answers. The corpus is replaced by the corpus with wrong annotation, and the original hter value is replaced by the normalized time, which is used as the standard answer for prediction
vocabdir=./data/vocab/ccmt2021/55
Model path: finally, the storage address of the model trained with the new corpus
modeldir=./saved_error_score_model
This paragraph is to qe_model.py pass parameters. Here, make sure that the names of train, test and dev cannot be wrong. lab=mt.hter in the fifth line is the suffix of the hter value file in the original experiment. Changing to another suffix will report an error, so the suffix mt.hter will continue to be used here, and the suffix of the normalized time file will be changed to mt.hter
num_train_step = 40000 is the number of training steps
python qe_model.py \ --src=src \ --mt=mt \ --fea=sent.tfrecord \ --lab=mt.hter \ --train_prefix=${datadir}/train \ --dev_prefix=${datadir}/dev \ --test_prefix=${datadir}/dev \ --vocab_prefix=${vocabdir}/vocab.low \ --max_vocab_size=120000 \ --out_dir=${modeldir} \ --optimizer=lazyadam \ --warmup_steps=8000 \ --learning_rate=2.0 \ --num_train_steps=40000 \ --steps_per_stats=100 \ --steps_per_external_eval=1000 \ --rnn_units=128 \ --rnn_layers=1 \ --embedding_size=512 \ --num_units=512 \ --num_layers=4 \ --ffn_inner_dim=512 \ --qe_batch_size=64 \ --infer_batch_size=64 \ --metrics=pearson \ --use_hf=False \ --num_buckets=4 \ --dim_hf=17 \ --train_level=sent \ --avg_ckpts=True \ --fixed_exp=True \ --label_smoothing=0.1 \ --exp_model_dir=${exp_modeldir}
Start training
Then you can train
bash qe_train.sh
The training time is six hours, and the trained model is saved in saved_ error_ score_ In model
Perform infer
After the training, start testing the test set
bash qe_infer.sh
The generated forecast file is saved in the infer error folder
export CUDA_VISIBLE_DEVICES=0 metrics=pearson datadir=./data/qe/error-corpus modeldir=./saved_error_score_model inferdir=${datadir}/infer-error mkdir -p ${inferdir} data=test python qe_model.py \ --out_dir=${modeldir} \ --ckpt=${modeldir}/avg_best_${metrics} \ --inference_src_file=${datadir}/${data}.src \ --inference_mt_file=${datadir}/${data}.mt \ --inference_fea_file=${datadir}/${data}.sent.tfrecord \ --inference_output_file=${inferdir}/${data}.infer2
Calculate Pearson coefficient
Test for formed infer2. Send and test Mt.hter calculation Pearson