Monday, November 26, 2012

11 . 29 . 12 | Continued Push Into Healthcare

2012 cases in healthcare numbered roughly a dozen in:
  • Reducing hospitalizations
  • Reducing re-admits
  • Reducing exacerbations for disabled populations
  • Increasing engagement in CM/DM and in wellness programs 
  • Improving treatments
  • Streamlining utilization management
  • Predictive modeling for CM/DM selection
In all these studies, the device Nobi pioneered of cluster randomizing (e.g. by nurses, not patients) made the findings easy, fast and pure. This has remained controversial (without reason) but with more researchers adopting, the issue is inching toward more mainstream acceptance.

In most cases, widespread acceptance has required backing out precursors (such as HCC score) analytically, to show the findings do not change (and instead strengthen). No surprise here since the theory has been around since the 1920s but remains notoriously hard to grasp. These more practical exercises of showing users in their own language have been well received.

Monday, November 19, 2012

11 . 19 . 12 | The Control Book

An ingenious management tool as client efforts expand to several statistical designs was suggested to us a while ago by a veteran CEO. It is a seemingly simple report, with one page for each project showing the improvement in the main measurement, annotated for actions taken.

In assembling the first edition for monthly update, the organization reaches agreement on the operational definition of the measurement. Discussions that take place after this discussion are more focused. The Control Book allows executive teams to manage the improvement effort in a short meeting every month. Financial information is also footnoted so that the overall ROI is clear at a glance every month. The idea is simple. Its execution is deceptively difficult but consumes little time. The tool also makes implementation straightforward whereas it can otherwise be elusive.

The reasons this simple tool can be difficult to introduce differ by organization and are best discovered by each client. Once in place and with a monthly forum for managing it, the tool is remarkably powerful and much liked. It is harder to accomplish than the statistical designs, mainly since sustained implementation is the hardest part of improvement.

The Control Book is the most effective way to manage implementation, short and long term.

Wednesday, August 1, 2012

08 . 01 . 12 | Nobi Presents At ISRN Event

Nobi Speaks at the 2012 Summer Institutes on Evidence-Based Quality Improvement

Following the Network News article (3/20/12) Nobi was invited to speak at the 2012 Summer Institutes on Evidence-Based Quality Improvement Seminar held July 17th to the 21st in San Antonio, TX. 

Tracing the origins of the scientific method through the 1923 discovery of statistical design to pioneering work in application to CM/DM generally, Nobi’s Kieron Dey provided case studies and fragments to illustrate a new but long proven method for practical innovation in healthcare generally.

Aspects of management, science, statistics and economics were summarized together with what will cause mainstream adoption from an understanding of its prevention to date.

Insight into organizational dynamics and human nature that are an essential aspect of the scientific method resonated with the 160 clinicians, nurses, physicians, administrators and graduate students in attendance. These elements revealed how implementation (the hardest part) has been made straightforward in delivering and sustaining results predicted by studies. Central to the rapid cycle time for innovation the talk emphasized the ability of orthogonal design to evaluate 20+ changes to treatments or clinical models simultaneously without increasing sample size over RCT norms. The upshot of thereby evaluating over a million potential treatment variants was explained.

Advanced mathematical constructs employed in orthogonal design were simplified. For example the meaning of orthogonality itself (the device which allows cause and effect of 20+ interventions at once to be established) was demonstrated visually with a newspaper. The surprising fact that false alarm rate decreases when testing 20+ interventions was explained and proven with actual data.

Further speeding safe research, the reason why single-shot large orthogonal studies require no refining or validation testing was demonstrated by case example (as is known from a full appreciation of the subtending theory).

Interest continued by correspondence with attendees after the conference including “I thought “yawn” ..but, found [orthogonal design coupled with economic control]...the most interesting and provocative topics at the entire conference.”

Monday, July 16, 2012

07 . 16 . 12 | Disability Study

Nobi is advising on statistical design in a study that began in 2010 to improve health thereby reducing hospitalizations for disabled people. Three health insurance plans are participating, one of which is for disabilities due to severe persistent mental illness (SPMI), by examining the effects of sets of 11 interventions differing by plan somewhat.

The study employs large orthogonal designs with a few dozen nurses (care managers) and the thousands of people they provide telephonic care to. Health outcomes, preventive measures and acute admits are being tracked for analysis.

Typically, these designs have reduced hospitalizations from 5-20% with notable findings such as novel ways to reduce falls. Because each hospitalization costs an average of $10,000 significant savings in the $1-10 million range per study implemented are on file.

Study results are expected in late 2012. Implementation will then follow using the interventions found helpful. Interventions explored range from medications to counseling and screening efforts for the SPMI project, to educating patients on fall risks and changes to the care model for the chronically physically ill patients.

More Information Contact:
Michael Joliat
Tailoj Marketing

Friday, March 2, 2012

03 . 02 . 12 | Randomized Control Trial

Statistical design and the Randomized Control Trial (RCT).

The first recognized RCT in medical research was published in 1948 [1], following earlier work over several years. It remains a mainstay of the industry and rightly so. Statistical design has enormous implications to build on RCT across the healthcare industry:
  • It allows studies to be freed of the stifling restriction of randomizing members to nurses (or similar) in Care Management/Disease Management (CM/DM).
  •  It allows a treatment, such as a medication, to be optimized by dose, frequency, and other synergies. As well as proving out the basic treatment, and for no increase in sample size of subjects.
  • It avoids any “roulette” with the test subjects, in which half the subjects get a placebo. Instead, all are potentially advantaged provided all treatments and other variants are clinically founded.
  • It strengthens blinding since every subject is assigned about half of the total interventions tested, and in a way no-one can second-guess or influence until the researcher has analyzed the data.
  • It measures what happens in the real world as opposed to one in which the subjects may know they have a 50% chance of a sugar pill or other placebo.
  • It uses “intent-to-treat,” rather than a Pyrrhic test of enforced adherence. This of course can be used in an RCT, too.
  • Sham studies are considerably strengthened by dropping the device or procedure being tested among an assortment of other treatments and variants.

It is often supposed that while a statistical design offers advantages, the RCT must be more pure. In fact it is the other way: Fisher’s wider basis for induction [2] simply meant that if a treatment worked among so many other things, also varying, then it worked in the real world and not an artificially controlled one.


1. Marshall, Dr. Geoffrey, et al. (1948) Streptomycin Treatment of Pulmonary Tuberculosis. British Medical Journal.

2. Fisher, R.A. (1935) The Design of Experiments. Oxford University Press (Reprinted 2003) Pages 13 – 26

Sunday, February 5, 2012

02 . 05 . 12 | Randomization Distribution

Randomization Distribution | Modernizing An Old Device for CM/DM and Multi-Channel Optimization Generally.

Rampant misunderstandings about multi-intervention studies in care/disease management (CM/DM), other human studies (e.g. sales, education)and non-manufacturing generally are:
  1. Differences among test units (e.g. nurses, retail stores, students) will invalidate the study.
  2. Adverse selection could contaminate results (e.g. recently hospitalized; new stores, gap students).
  3. Influences like regression-to-the-mean, HCC-score, store-trends, size etc. will affect findings.
  4. Members (patients) should be randomized to nurses (care managers); ditto students/classes.
  5. A 2nd. Study or RCT would increase confidence or “validate”.
  6. False-alarm rate will increase with the more things tested.

Since 1926, Fisher [1] hasn’t been well understood by mathematicians. This matters more today in healthcare and complex multi-channels generally. Using also a couple of other devices, we fix (simply for users) all of the above and free of man-made constraints that have held back progress. Using the CM example then closing with the multi-channel sales example to illustrate for all industries:

#1 is completely solved by the simple trick of control-charting pre-study results across nurses from a time-window equal to the planned (randomized!) study period and finding homogeneity (akin to stability in manufacturing). A dry-run pre-study analysis (like a Heckman-Hotz econometric test) is essential or patterns among nurse admit rates can still give spurious findings. This dual homogeneity check has been controversial among PhD statisticians and academics without reason. Users, more correctly, have no trouble with it.

#2 to #4 is solved by correct randomization of nurses to the study. #4 is popular among mathematicians but usually wrong and then would not improve anything or make money. Closed cohort design will re-assure everyone more but isn’t necessary and cannot always be done (e.g. transitional care). Of course long-term validation must have a closed cohort or propensity scored analogy, not open cohort.

#5 is a little like sending a rowing boat out to see if it was safe to sail the ocean liner that just sailed through. Confidence remains about the same off similar results and studying new things for a shorter time trumps replicating, even with one-factor-at-a-time testing [2]. Of course if a 2nd study is conducted, even randomization would not allow it be done on the same sample! A fresh random sample is called for: just like in manufacturing.

#6 goes away at ~20+ interventions: false-alarm is a problem for small studies devoid of scientific context.

Multi-channel optimization (e.g. sales in stores, online and stimulated through media, and call centers; or care by nurses, pharmacists, automated systems and house-calls to the same population) is solved by simultaneous statistical designs provided randomization is correctly deployed and the channel tests set up with a clever new device that’s easy for users (mutual-orthogonality). Education cases to improve learning/careers, then all industries follow the same model.

Optional Note for Professional Statisticians: In a 20 run design, a contrast is simply the calculation to find the effect of one intervention, simply by averaging the 10 tested vs. the other 10 (counterfactual). Comparison to all possible 10 vs. 10 contrasts gives a yardstick to see if the effect is “real” or essentially by chance. The histogram on the home page is of 10,000 random contrasts out of more than 200,000 calculated. There are 20C10 = 184,756 contrasts in total and the extras were run so that every contrast was more likely included. The histogram of contrasts is from untransformed response data.

The average of the 200,000+ contrasts is -0.07337 with maximum at 419.59 and minimum at -410.97.

The true contrast is at -261.39, from a CM/DM case measuring hospitalization rate per thousand people per year.

The histogram shows that a normal approximation is close, as expected. However the calculations are all distribution-free. No original assumption of normality (a “bell-shaped curve”) is needed. The histogram indicates visually how well all possible contrasts approximates normal. The true p-value (from this randomization distribution) is 0.01 vs. a normal approximation (from linear model software) at 0.0033.

Of course the actual contrast will only be absolutely largest among all randomized contrasts (p-value equals 1/184,756= 0.0000054) if the treatment and counterfactual have no overlap in the raw data (as driven by effect size). 

Further insights based on the above reveal why randomization of nurses to treatment combinations is correct but of members to nurses is usually not, as it cannot be often managed that way. Also that modeling nuisance variables (such as prior admit rates, HCC-score, selection criteria etc.) will approximate the same answers as the correct analysis relying on randomization directly. (This back-end analysis first suggested by Neyman in the 1930s is not needed. Were it to differ, one might look for multi-colinearity problems or otherwise check the model.) Of course running such a model to see if randomization “worked” is folly. It will always “work” provided the device is used correctly, which it tends not to be. Further consideration also reveals why a single test unit per combination in the study design is ample (in 20+ intervention designs) and replication is a waste. Finally, that a common mistake is to try for n>30 per combination whereas  n=1 is usually ample.

On misunderstanding #1 (homepage) a trick of analyzing change in admit rate by nurse (since the prior) can be popular. But, this is only valid if that prior is significant and then a covariance analysis is used since the “change” adds noise and can cause errors. 

Power calculations are performed in the usual RCT way (and yield identical sample size requirement) but miss the point that they will be pessimistic as variation usually reduces during large studies. Our earliest case found the standard deviation about 1/3 of the prior. The far larger issue is that single-intervention studies (excluding, say, 19 that could have been included with the same resources) have zero power for all the untested things. 

It has not escaped our attention that performance often improves from day 1 of large studies, rendering them attractive to businesses especially if in urgent need of step-change. Also that large studies stop any accidental roulette with customers, members, students etc.


 1. Fisher, R.A. (1926). The Arrangement of Field Experiments. J. Min. Agric. G. Br., 33: 503-513 

2. Box, G.E.P. (1966). A Simple System of Evolutionary Operation Subject to Empirical Feedback. Technometrics. Vol. 8, No.1