From testing it a bit, I believe it is related to how the data and the images are read by Google’s bots. To make sure that it the images are read and associated with the relevant text as best as possible, I’d suggest adding an info bar and deciding how the text displays on the gallery.
As mentioned, I think it may be related to using the expand mode to have the information as most lightboxes can’t be opened by crawlers so hence why I suggested showing the text within the form.