paddlepaddle Baidu flying experience two - Chinese text classification

Posted by Jtech inc. on Sun, 20 Oct 2019 19:59:01 +0200

1 related links

ERNIE Code: https://github.com/PaddlePaddle/ERNIE/tree/develop/ERNIE

2 specific use

2.1 use steps

  • Download data:

    Download the model (including configuration files and dictionaries) and task data.
  • Decompress the model and task data, start the training, execute bash script / run ABCD chnsitcorp.sh, and attach the modified run ABCD chnsitcorp.sh
set -eux

export FLAGS_sync_nccl_allreduce=1
export CUDA_VISIBLE_DEVICES=0
export TASK_DATA_PATH=/path/to/task_data
export MODEL_PATH=/path/to/ERNIE_STABLE

python -u run_classifier.py \
                   --use_cuda true \
                   --verbose true \
                   --do_train true \
                   --do_val true \
                   --do_test true \
                   --batch_size 24 \
                   --init_pretraining_params ${MODEL_PATH}/params \
                   --train_set ${TASK_DATA_PATH}/chnsenticorp/train.tsv \
                   --dev_set ${TASK_DATA_PATH}/chnsenticorp/dev.tsv \
                   --test_set ${TASK_DATA_PATH}/chnsenticorp/test.tsv \
                   --vocab_path config/vocab.txt \
                   --checkpoints ./checkpoints \
                   --save_steps 1000 \
                   --weight_decay  0.01 \
                   --warmup_proportion 0.0 \
                   --validation_steps 100 \
                   --epoch 10 \
                   --max_seq_len 256 \
                   --ernie_config_path config/ernie_config.json \
                   --learning_rate 5e-5 \
                   --skip_steps 10 \
                   --num_iteration_per_drop_scope 1 \
                   --num_labels 2 \
                   --random_seed 1

  • Code interpretation

2.2 results


For simple Chinese text classification effect is very good.

3 Summary

  • Baidu has built up the basic framework, and the overall use experience is very good. In a word, if you are familiar with the API of its functions, you can make full use of these basic models in the Chinese dataset.

Topics: github Python JSON