Ok, I have to make more tests but there is a possibility that I found something.
There is any chance that the batch offset is being miscalculated? After some print_r in strategic places it seems like the subquery from content are something like
1000, 1001, 1002 ... 1099, 1100 and suddenly a gap where the ids start again at 1201 or something like that.
Could be that as each real content row is really two or more index entries (cause the several languages), it is miscalculated?
Maybe this is about the 99%. Could be that you don´t have this problem cause your batch size is less than your total article number...? mmm I´ll do some more test, let me know what you think.