The trick will be to keep the comments in sync with the images.
If he sets the images to change each 4 seconds, he will need to record his voice, and simulate (Or use the preview option) to look at what images he is currently on.
When you encode the whole thing, the time to produce the DVD output will depend on the CPU speed and the amout of ram in the PC.
This is why comments needs to be pre-recorded as JJ suggested.
My 2 cents...