From Forbes Blogs:
As someone whose work crosses so many disciplines, I spend a fair bit of my days skimming new developments across not only computer science, but the humanities, social sciences, arts and many other fields, looking for connections and unexpected new approaches that might benefit my own work. The intensely siloed nature of academia is well known, but equally striking is just how rapidly citation standards are falling in a Google Scholar world filled with explosive growth in available knowledge, in which scholars seem genuinely unaware of developments across the rest of their own field, not to mention the rest of academia. Could machine learning approaches dramatically reform the “related work” and citation review component of peer review and academic publishing?
Perhaps the most striking element of modern scholarship is that in an era when much of our modern scholarship is available through web and academic database searches, it takes only a few mouse clicks to compile a cross-section of the recent developments in a given space. Yet, peruse the “related work” or “background” section of a typical academic paper and it is amazing just how discipline-specific and artificially circumscribed the set of references are. While scholars have always cherry picked their references to argue for why their work is novel or an advance over previous work (and thus worthy of publication), the comprehensiveness of citations even in top journals appears to be in marked decline.
Not a day goes by that I don’t see a paper in a top ranked journal make a claim about being the first to use a particular method or dataset or perform an analysis at a particular scale that I can’t point to dozens of other papers across other fields that reached that milestone long ago. Those same papers often use methods or datasets in ways that violate the assumptions that governed their creation and thus render any results immediately suspect.
. . . .
At the same time, the exponentially growing body of research relevant to new studies has grown far beyond the ability of even a small team of humans to monitor and digest. Even coauthored studies with authors spanning both the problem domain and computational and statistical experts may lack deep experience with a specific method or dataset being used and make assumptions that end up undermining the study’s conclusions.
Peer reviewers tend to do little better when it comes to evaluating specialty methods and datasets outside the traditional scope of the discipline, as they too typically lack the expertise to catch errors or raise concerns with those datasets and methods. Given that most are academics themselves, they too are frequently unaware of work happening outside their discipline and especially work published outside of academic venues, such as the blogs and social media outlets favored by startups, major companies and even independent open data researchers.
This raises the question – can machine learning and even simple automated statistical analyses help academia take a first step towards reforming the peer review process by augmenting the skills and experience of human reviewers and pointing them to things they may otherwise have missed?
. . . .
At the most basic, such tools could help verify citations, flagging that a quote or data point attributed to an article was not found in that publication. This could help immensely with the flood of incorrect citations that plague the Google Scholar copy-paste era of scholarship. An automatic review that flags every miscited quote and data point would go a long way towards cleaning up citation practices.
Moving up a level, imagine automated filtering that checks for odd statistical characteristics of each submission. Some statistical checks require deeper understanding of the paper and its methods than can be reliably parsed by current fully automated techniques, but even a semi-automated process where human reviewers parse out certain details and use automated tools to evaluate the numbers for oddities, or where reviewers enter a simple summary of the statistical workflow and the tool flags any potential methodological concerns, could go a long ways towards assisting reviewers even beyond increasing their statistical training and adding additional statistical reviewers to non-traditional journals. Similarly, image assessment algorithms could flag obvious signs of image manipulation in biomedical journals.
Link to the rest at Forbes Blogs