Optimizing page performance when using datasets

For page performance optimization:
With complex pages, a number of data collections may be involved, and so a number of datasets might be added to the page. With a dynamic page, the other datasets may be linked to the dynamic dataset (and to each other), and then page elements are linked to fields in the various datasets.

In some cases, instead of adding a dataset for another collection and linking it to the dynamic dataset, the additional collection can be made available to the page as a reference field collection in the dynamic dataset collection (and similarly for other related collections).

My question concerns page performance/responsiveness. Is a performance advantage (or disadvantage) obtained by including collections as reference fields in a collection, instead of adding them to a page as separate datasets which are then linked in the dataset configurations?

There seem to be at least two dimensions to this performance question:

  1. Effect on performance of a specific page when using fewer datasets by including reference fields in the primary collection(s). [This is my question above.]
  2. Effect on raw performance in querying a collection with reference fields (that are unrelated to that query) compared to querying the same collection without the reference fields. This dimension may constrain (or encourage) a practice of adding reference fields to a collection for a special purpose, where most uses of the collection may not involve those reference fields.

If there is only slight performance benefit in using reference fields in a dataset collection instead of multiple datasets, my relatively uninformed inclination is to avoid adding reference fields for limited-use special purposes; instead, use multiple datasets. But with this topic submission, I’m hoping to become more informed about what is actually the best practice.

Thanks!

I don’t think you’re going to get any solid numbers and there’s probably confounding factors that make this hard to test as an end user but:

  1. If you’re using the data in what would be a referenced collection anyways then for the sake of maintainability and reasoning about what your code does I’d use reference fields.
  2. Reference fields are just IDs pointing at rows in another table. If the referenced collection isn’t actually being queried then the cost of it being present is minimal (ie. just the cost of these IDs being returned). If the referenced collection is being queried (ie. its data is being used) then the cost is another query. Reference fields are less about performance and more about representing a many-to-many relationship between collections.

I’d agree with this. Sometimes having reference fields is beneficial, other times it’s unnecessary because there’s not truly a many-to-many relationship between the collections.