Reinforcement Training

本页主要介绍一下Reinforcement Training的使用方法。总体基于文章"Self-critical Sequence Training for Image Captioning" 的方法，使用cider作为reward，进行policy gradient训练。 (https://arxiv.org/pdf/1612.00563.pdf)

目前代码在rl_training分支。

数据准备

cider的计算基于鹤达写的in graph的计算方法，具体代码在im2txt/tf_cider.py里，计算过程需要ngram的document frequency，需要预先计算，运行 scripts/build_document_frequency.sh 即可。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reinforcement Training

数据准备

使用方法

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally