Hi,
First and foremost, I wanted to just say that I really appreciate this question and the amount of SEO detail it contains. (As an SEO geek, it’s everything I could want!)
I’d like to address your concerns item-by-item beginning with the code to text ratio you mentioned.
Interestingly enough, Google’s John Mueller has taken this up in the past. Per Google, code to ratios are not used in any way.
That said your point about the ratio and spam content is very interesting. Google has advanced tremendously as it comes to spam detection and uses a quite holistic approach to determining spam. In fact, Google has gotten so good at spam detection that it knows which sites contain low-quality links and automatically ignores them should any such links point to your site.
More to the point, the biggest factor in determining if the content is low-quality is the content itself, as Google has consistently repeated. Google, through the advent of natural language processing and other machine learning properties, has become quite adept at understanding content and its quality.
Just to illustrate this, Google has said that it profiles content by comparing one piece to another. In other words, via its machine learning properties, Google is able to understand how content around a given topic should ‘look and feel’.