Covid policy often relies on the worst science: cherry-picked observational data
Before Covid, many people knew that correlation is not causation. Unfortunately, doing RCTs (Randomized Controlled Trials) to assess causality is expensive (both time and money) and sometimes, all we have is observational data which can usually only measure correlation. In such cases, it is very important to look at the totality of the data. By only analyzing subsets of the data selectively, you can prove almost anything you want.
Already, a lot of damage has been done in nutrition — my topic of deep interest before Covid — due to “top” scientists publishing data selectively. The shaky foundations of modern nutrition was based on an observational study that cherry-picked only 7 countries that fit the hypothesis of the “scientists” at the time:
One, the Seven Countries Study, originally included many more nations. But in only seven did populations consuming lots of saturated fats have high levels of heart disease, prompting recent accusations of cherry-picking data.
That 1970 study was hugely influential, however, leading to congressional hearings and guidelines advising against eating saturated fat and arguing for the benefits of polyunsaturated fats.
During Covid, the problem seems to have become much much worse. There is a lot of seasonal or otherwise unexplained variation in the incidence patterns of Covid peaks and valleys. These patterns are not fully synchronized across the world. So, for every hypothesis — even opposite ones — you come up with a pair of cities/countries/states to show correlational evidence. This can be somewhat mitigated by looking at the totality of the data, or at least large datasets NOT selected in odd ways.
However, instead of looking at the totality of the data, right from the beginning, the mainstream experts have shown a great tendency to just selectively show the subset of the data that shows their recommendations correlate with better outcomes. For example, the CDC often comes up with observational evidence for its recommendations. What is often odd about their papers is that look at a very small subset of the data they have: not just spatially but also temporally. What’s more problematic is that the selection of the time-period appears to deliberately exclude the parts that don’t justify their recommendations. A shameful example is their mask study comparing Covid rates between mask-mandate and non-mask-mandate counties in Kansas. First, why only Kansas? why not do the analysis for the whole US? More worrisome is the fact that their analysis time-period suspiciously ends at the very week after which the case rates were much higher in the mask mandate counties:


Recently, CDC did the same to justify its nonsensical and much more consequential stance that Covid recovered should get the Covid vaccines. For some context: In March 2021, experts were claiming that vaccine immunity will outlast natural infection immunity, but the opposite seems to be happening. Covid has been around much longer than the Covid vaccines, yet multiple observational studies all over the world found natural immunity to be at least as good as vaccine-derived immunity. For the variants, natural immunity appears to be much better. Although getting the vaccine may be safer for many people than getting Covid, it is unscientific to treat Covid-recovered inferiourly to the vaccinated.
CDC ignored all that and went to major news outlets with a strange observational study from Kentucky showing that among the Covid-recovered, the rate of reinfection (measured by PCR/antigen tests) was 2-3 times lower among those who got the vaccine, claiming that this study settles the debate. As noted by many commenters at the medpage article about the study, the CDC paper is missing too many details, e.g. how many of them were severe or even had any symptoms? This is especially important because many places only require the unvaccinated (including Covid-recovered) to get tested for Covid even if they have no symptoms. So, we are relatively oversampling the unvaccinated for Covid infections. Also in May 2021, CDC decided to stop tracking mild infections among the vaccinated, BUT NOT IN THE UNVACCINATED. Guess when did this Kentucky CDC study start: May 2021!!
Also, why only Kentucky? Why not release the data of the whole US and also release the data about how many of the reinfections were severe? Why not release the data for the period before May 2021 for all of US, at least for Kentucky?
None of the mainstream media outlets (e.g. nytimes) bothered to question the CDC and just parroted their talking points. Only Fox news dared to question it:

Fortunately, there is an accidental RCT about whether Covid vaccines reduce symptomatic infections in the already recovered and it did not any significant difference.


Note that the number of reinfections in this trial was small even though the trial was huge (n>40K), so it is possible that a trial with many more Covid recovered finds a statistically significant benefit, but until that trial is done, given the non-trivial risks of the vaccine, there is no scientific basis to recommend that the Covid recovered get vaccinated.
What can we do about it
People, especially the educated elites need to stop parroting CDC and start understanding and demanding the evidence/full-data behind the recommendations of experts. Data used to justify public policy — especially those used for mandates/discrimination — must be collected rigorously and made publicly accessible to anybody and not just left to the CDC to selectively release only the pieces that justify their narratives.
It would be ideal if the data is revealed in a verifiable way without sacrificing privacy of the citizens: given how much the CDC misleads without actually technically lying, it is not implausible in my opinion that they may sometimes fudge the counts to cover their asses. Fortunately, many computer scientists have looked at an isomorphic problem in the realm of computerized verifiable voting: (using sophisticated cryptography to) make sure that results of an electronic election are verifiable to the citizens without revealing who each citizen voted for. Here is an example technique that I don’t fully understand yet: Project Civitas