The first real transcription will download the selected Whisper model. For long Chinese videos, start with medium; use large-v3 if you want better accuracy and have enough disk/RAM/GPU. The GUI runs ...
sdim ref_filepath, 3200, MAX_UPFILE_AMOUNT // 参照画像のフルパス sdim ref_filename, 320, MAX_UPFILE_AMOUNT // 参照画像の名前のみ dim ref_chk, MAX_UPFILE_AMOUNT dim ref_chk_id, MAX_UPFILE_AMOUNT dim ref_mesbox_id, ...