I have to come clean: this isn’t just a company blog. It’s also an attempt to test a new way of publishing - one that tries to fix some of the gaps in the scientific system.
You can’t force science to work
Academia is built on a business model of publishing papers. I still feel uneasy about some of the papers I published during my PhD. By my current standards, most would never have seen the light of day, like this first paper mentioning Turbine. But I try to empathize with other young researchers. Why do we expect every PhD student to make a significant, standalone discovery, in isolation?
That’s probably why I’ve wanted to help fix how science operates ever since I learned the ropes outside academia. I got burned so many times when a nicely tuned model fell apart on new data.
For a while, I chased the idea of a magical metric that could extract the truth from a pile of publications no matter how biased they individually were. Some techniques help. But I have never found any way to fully escape Goodhart’s law1 or the garden of forking paths2.
So what’s the next best thing?
Playing our own hand well.
How our research can be different
Getting the incentives right
Being a startup gives us the freedom to do research differently.
We are incentivized to deliver real value - value customers are willing to pay for. That means our models have to actually work. They must be predictive on new data, in the environments that matter.
That’s harder than it sounds. In statistics - and by extension, machine learning - it’s shockingly easy to fool yourself, especially with messy data. And under pressure, even easier to fool others.
That’s why this is the first rule in my research teams:
No one is incentivized to produce a specific outcome in any given experiment.
Their job is to give their best estimate of what the truth might be. (We only need to ensure that the whole set of experiments is designed so they lead us - eventually but inevitably - toward some solution.)
Staying focused
We aim to solve specific problems in biology. For example: building models that can accurately predict how cancer patients will respond to new drugs.
We’re not bound by the usual academic constraints: no need for everyone to make a separate, novel discovery to earn a PhD, and no need to follow grant requirements written five years ago. Entire teams can focus on the one big question, pivot quickly from dead ends and iterate faster. This lets us make much more rapid progress.
Doing replications
This focus gives us the bandwidth to not just learn from others, but to integrate their work by replicating key studies on our own datasets, using our internal benchmarks.
That often takes weeks - adapting the code, reshaping the data and running it through our evaluation pipelines. But it’s worth it. Replications help us pick better paths forward.
And this different process give us something valuable to contribute back:
publishing replications and the dead ends we’ve ruled out.
The current scientific publication system isn’t designed for that. But no matter. The Web is large.
A different way of publishing
So I’m experimenting with a new way to publish.
When I come across something useful, I’ll first publish it here. Before posting, I run it by people I trust - people willing to add their name below. It’s a kind of open review. If someone from the team is willing to expand it into a full article, they’re welcome to.
Personally, I think reviewed preprints are perfectly valid scientific outputs. But if they want to take it through the full journal process, I won’t stand in the way.
A springboard to write papers
So these posts are breadcrumbs. Useful on their own, but also possible seeds of future papers. That comes with a few advantages.
No need to retrofit a story
Many papers cram disconnected results into a single “story” just to meet the bar of publication in journals. That creates brittle arguments and logic that’s hard to follow. Here, each post tells one story. It’s easier to write and easier to read.
Easier to read
Scientific writing doesn’t have to be dry, it just often ends up that way. Partly due to authorship-by-committee and partly because most of us aren’t native English speakers.
But mainly because that’s the convention. I remember my PhD supervisor redlining my writing back into passive voice and past tense. “This is how you write scientific text”, he said. “It needs to sound serious”.
That’s just the norm. Journals themselves are trying to push for plainer language, but it hasn’t caught on.
Well, it can here.
Trade secrets
Some of what makes Turbine work has to stay confidential. That’s what funds the science we want to do tomorrow.
But there’s a lot I can share. Even just knowing what didn’t work can be valuable, even without full access to our code or data.
Since we can’t open-source everything, many of these insights would otherwise go unpublished.
Less friction for publishing
In the hectic startup life, putting the full timeline of getting something published on a roadmap is daunting. So publishing usually ends up staying in the backlog. Starting with a blog post, then evolving it into a preprint, and finally submitting for peer-review may help break it up into manageable chunks.
Strengthen existing science
Since most published research findings are false, publishing negative results and replications is essential. But they don’t fit into the current system.
This format lets me contribute in a way that helps strengthen the science we all build on - and maybe show that failures often teach more than success stories. They should be celebrated, not hidden in the drawer.
So we can all keep pushing biology forward. Together.
“When a measure becomes a target, it stops being a good measure”.
Unintentionally increasing the degrees of freedom of a statistical test when preparing the data for analysis, thereby inflating the significance of findings, producing false positives. I have never found a method which can reliably correct for unobserved multiple comparisons.
Good idea... let's hope no journal will ever reject anything based on the fact it was pre-published in this "non-scientic" format...
If this works, I might even consider publishing stuff for my own PhD-s sake once again... the above listed things (and some others related to how monetization works) totally took my motivation away back then...
Nice start. I will be following you.