Rich Content-to-Dataset Problem

And another question: How is the dataset filtered?