Science —

New wave of online peer review and discussion tools frightens some scientists

Getting research torn apart publicly isn't fun, but stonewalling the Web isn't an answer.

Sites like Publons and PubPeer hope to quicken the pace of scientific conversation.
Sites like Publons and PubPeer hope to quicken the pace of scientific conversation.

Earlier this year, I wrote a story about a new HIV/Aids detection kit that was under development. Since that time, the same group has published two more papers on the same topic, but questions are starting to be asked about the original research. The questions were so simple that I was pretty embarrassed I didn't spot the problems on my own.

But I wouldn't have gotten even that far were it not for the new directions that peer review and social media are taking science. I was alerted to the problems by twitter user @DaveFernig, pointing me to a discussion about the paper on PubPeer.

Before getting to that, let's recap what impressed me about the HIV detection paper. It achieved a couple of things that made it stand out from a veritable truckload of similar proof-of-principle experiments. The test was very sensitive—so sensitive that it could detect viral loads below that of the standard test and may even reach single molecule sensitivity. When someone uses single-molecule sensitivity, I tend to get all hot and bothered and all my critical thinking faculties vanish for a while.

The second remarkable finding was that their system worked in blood serum. Usually these tests fail because the results rely on proteins binding to specific matching molecules. But blood serum is full of lots of different proteins that can bind in a non-specific way (think of it like dust sticking to your television), which interferes with the results. Since a test is only really useful when it is sensitive to small concentrations of the target molecule, this non-specific binding usually overwhelms the signal of interest. For reasons that are not clear, this new test does not suffer from problems related to non-specific binding.

The journal club disapproves

The PubPeer discussion centers on just these two aspects of the paper. The HIV/Aids test works by a particular protein, called P24, binding an enzyme to a substrate. The enzyme catalyzes a reaction to remove hydrogen peroxide. By removing hydrogen peroxide, the rate of precipitation of gold particles is changed, which also changes their shape. In the presence of P24, the particles are lumpy and turn the solution blue (instead of red in the absence of P24).

So far, pretty normal for this sort of test. But the transition from red to blue takes place between concentrations of 10-19g/ml and 10-18g/ml. This didn't catch my attention at the time, but proteins are pretty large molecules. We are talking, at the lowest concentrations, about having two molecules per test well. Given that there will be statistical noise, we would expect that individual tests at the lowest concentrations should vary from having absolutely no P24 through having something like five P24 molecules with an average of two.

Even if the test were binary—"is P24 present?"—these lowest concentrations should show a large degree of noise, simply because a third of the tests should be on samples that have no P24 due to random variations. Yet, the noise is pretty constant over the whole range here.

The authors answered this in a follow-up paper, in which they claim that changing the hydrogen peroxide concentration by a tiny amount—a change of 50nM in a solution with a concentration of 120μM—is sufficient to change the way the gold particles form. But in the original paper, they show the same transition taking place in a linear fashion over a concentration range of 120μM, a range that is 2400 times larger.

There is a conflict here. We can either have a slow transition, in which case, the claim for single molecule sensitivity fails. Or the transition is fast, in which case, the initial concentration of hydrogen peroxide needs to be prepared with an accuracy of 0.01 percent (difficult). To give you an idea, a standard microliter pipette is specified to an accuracy of around three percent, and the concentration of hydrogen peroxide is usually only specified to an accuracy of two percent. To even measure the concentration more accurately than that requires specialized (and possibly even custom) lab equipment. Effectively, the authors have performed an extremely accurate measurement of their hydrogen peroxide concentration and not put their methodology in any of the papers.

This practice is repeated in other papers as well. So far, three papers have reported reaching single molecule sensitivity. This demands that, somehow, solutions are prepared to very high accuracy.

Here is where the value of a site like PubPeer comes in. As a scientist, I appreciate the discussion. Somewhere, some poor graduate student is trying to replicate this experiment and, unable to replicate, cannot move on to the next step of their research program. Yet, having a public and searchable discussion of the paper provides an opportunity to bring these hidden details and pitfalls into clear view much earlier in the process.

Peer review is leveling up

The discussion generated by those papers highlights the value of post-publication peer review. PubPeer provides something that serves as a kind of journal club, where anyone can join in the job of tearing a scientific result into little pieces in order to understand it. This is the bread and butter of scientists, but wouldn't it be useful to be able to refer to such discussions later on when writing your own papers?

Normally, this sort of back-and-forth takes the form of a comment, with a response, published in the same journal as the original work. But getting a comment into a journal can prove to be troublesome (PDF). And even then, it is difficult to find. There is usually no hint that the original paper has generated a comment. Instead, you have to specifically look for it. Nevertheless, a comment can be cited. It is part of the literature, while PubPeer comments are ephemeral things that can be removed, edited, and are certainly difficult to cite.

Publons aims to change all that. Members of the site can import papers, rate them, and discuss them. In ongoing discussions, members can endorse reviews. When the endorsements reach a certain threshold, the review gains a digital object identifier (DOI), turning it into an object that can be cited in more traditional academic literature.

This last step differentiates Publons from PubPeer. The aim seems to be to create a robust community that allows one to quickly determine the value of a paper. For instance, when I want to start a new research project, I have to drop in a list of my "best" relevant publications. Now grant reviewers can quickly find out what the community truly thinks about those papers, acquiring a more accurate picture of my previous work. Likewise, active participation in the community will allow reviewers to accurately gauge your contribution above and beyond the standard format of academic papers.

Unfortunately, I think Publons rather misses the mark in separating peer review from discussion though. A journal editor may be happy to receive a review that says "No problems, go ahead and publish," but they are certainly not happy to get "Paper has methodological problems, do not publish." As a reader, neither am I. Exactly what are the problems? How is the methodology flawed? What would need to be done to make it a worthwhile publication? Without further information, it is impossible to judge either the review or the paper.

The Publons solution to this is endorsements. Get enough endorsements and your review is trustworthy. The danger is that it becomes a numbers game. How many discussions with DOIs can you generate? Can I drop them into my performance review and claim them as output? If so, can I create a ring of mates who review papers, endorse each other, and claim great contributions? That is not something I would like to see, yet, if it becomes something measured by administrators, you can be certain someone will game the system.

Publons recognizes this. From their FAQ: "Can authors game the system with fake reviews? They can try, but that's not the community we're trying to build. Our editors check every review we receive. In the case of questionable reviews our policy is to engage in a dialog with reviewers to improve their work. If we are unable to contact a reviewer within two weeks the review is removed." Yes, the reviewer is reviewed. That, however, does not seem like a scalable solution. If the site becomes more popular, it will not be possible to keep up with the volume. In the end, it will take community policing to keep the problem under control.

Just give it a go

Nevertheless, Publons intrigued me, so I thought I would sign up and see if I could destroy my reputation through gratuitous reviews (not really). The sign up process was pretty easy, though I could not manage to add my affiliation to my profile.

After that, I was off to check some of my own publications. Disappointingly, none had been reviewed yet. "How could I be so insignificant?" I asked myself. On the upside, Publons offers the ability to follow a paper, so if it does get reviewed, you get notified and can participate in the discussion. This is a great idea, as it brings readers and authors closer together.

And that brings up the last point. There are no discussions. Although there is a space for discussion, even when reviews are contradictory, there was no discussion. On PubPeer, the idea is to generate discussion. You don't have to review the whole paper, you can simply point out some point of interest (or a photoshopped figure). In the end, I think this outweighs the mana acquired from a citable review.

Joining the discussion

In the case of the HIV paper, the authors are taking the path of least information. As far as I can guess, one of the authors is participating anonymously (which is fine), but those comments don't go much further than asserting that everyone else is not just wrong but also naive. Even worse, the authors have blocked the publication of correspondence that went between them, journal editors, and one of the PubPeer participants.

Note: we have since been informed that PubPeer voluntarily removed the post after consultation with the paper authors.

Such an approach is, I believe, about the worst form of training one can give a graduate student. In the face of questions, stonewall. The fact is that even if the experimental result is a fluke or due to some strange unknown side-effect, everyone learns something from that. Scientific results often turn out to be less sure than we would like them to be, but it is discovering how sure those results are that drives us forward. Instead, the senior authors are teaching their trainees that, in fact, the result must stand come hell, high water, or reality.

For the rest of us, the big danger is not actually being aware of resources like PubPeer. I can easily imagine that, at some conference somewhere, a grad student, entirely unaware of the discussion surrounding their paper, is going to get ambushed by a group of well-prepared scientists looking for answers. Unfortunately, being unaware also means that no one from the author list has responded to any of the discussion. A certain amount of anger may well have built up. This is also not good training.

The culture of peer review and post-publication criticisms of results is changing fast, and it is time for labs to start responding to this in a systematic and positive fashion. Maybe these online forums can be a true part of it.

Listing image by Publons

Channel Ars Technica