Elsevier's slice of Big Data pie

One of the more illuminating points made during the twitter storm that followed Tuesday's announcement of the Elsevier Mendeley takeover came from [ Heather Piwowar ]. To recap: Elsevier (academic publisher) acquired Mendelay (a reference management platform) for up to £65m. Mendelay is a collaboration tool, with Open Access (publications and data as well as the 'lab notebook' that facilitates scientific discourse and investigation) at its heart. By doing so it  allows researchers to share and annotate papers, build bibliographies and create alternative citation metrics. It also has an institutional edition which can be used to manage research information ('...combining a next generation collaboration platform for the individual researcher with a real time analytics tools for the library' according to the corporate video.) The 'oodles of workflow data' that Elsevier have acquired include data relevant to where researchers are publishing and what they are reading.
Under commented at the time but nonetheless significant in this context was Elsevier's acquisition of  the Danish company Atira in October 2012. Atira are the development team behind PURE, a Current Research Information System (CRIS) with an optional Institutional Repository service. (A CRIS is a research management system drawing on institutional data to create relationships between researchers and research publications, projects, project funding etc.) The platform has been deployed in 19 UK universities primarily to support reporting and submission for the REF2014 exercise. 
Both acquisitions represent a new focus in Elsevier's corporate strategy built around the data generated by institutional research management platforms and those channels of scholarly communication that utilise open content and social media. At its core is a concession that opportunities are emerging for new data driven business models. Cash rich after sustaining year on year profits of around 33% for the past decade, it can well afford to take a punt.
The core of Elsevier's business remains an academic publishing model buttressed through rights acquisition and library subscriptions. Looking past the corporate glad-handing that has followed the Mendeley deal it's worth bearing in mind the ferocious lobbying undertaken by the company to protect this cash cow. Elsevier will continue to talk out of both sides of its mouth about Open Access while it develops a tangible corporate strategy that guarantees sustained profitability for these data driven services. Its well established bibliographic database SciVerse Scopus was already integrated with PURE prior to the acquisition and my feeling is that this integration will be developed further. Currently Scopus data populates PURE with both publication records and citation data which can be delivered as open content through a publicly accessible research portal. The Institutional Repository acts as an optional service associating full-text, when available, with research records (publications, projects, etc).
So far Elsevier has not made PURE a Scopus only CRIS. Why would it? The API to the bibliographic database of its corporate rival Thomson Reuters (as well as PubMed and Arxiv) remains available to PURE. This approach allows the institutional publication dataset to be enhanced by records imported from a variety of sources. Checked for accuracy (by the institutional library) and associated with open research outputs, they are then disseminated via the web. But the enhancement of this content doesn't stop there. 
The Gordian Knot of author disambiguation looks like it might now be cut by the ORCID project. Elsevier is a development partner. Authors register with the service and receive a unique identifier. They can then import their research output data (institutional affiliation, publication and patent references, grant information) and make these data open or closed. One of the real impediments to assessing research impact based on citation metrics is the variation of author names in the literature (John Doe, John F. Doe, Doe, John Frederick, etc). Accurately associating these variations with a specific author will give any commercially available bibliographic database a competitive edge. ORCID's close integration with the Scopus database allows authors to import their publication references provisional on the association of their existing Scopus id with the ORCID unique identifier. In PURE, authors can now include their ORCID UI when creating reference data. If they choose to import publication records from Scopus into PURE, the ORCID UI can be used to make an accurate association between the imported record and specific institutional authors.
This machine driven data cleansing and enhancement relies on platform to platform interoperability. With the acquisition of PURE and Mendeley and the close integration of Scopus into ORCID, Elsevier have developed a locked-in platform specific workflow with real commercial potential. Open in terms of access and re-use. Open on Elsevier's terms.

Irish Libraries and the Crisis in Scholarly Publishing: What's the Big Deal?

It's a rare thing indeed that a parliamentary question gets asked about academic library subscriptions but that's exactly what happened in Dáil Éireann [principal house of the Irish parliament] on 3rd March 2013. Peter Mathews TD asked the Minister for Education and Skills to make a statement '...regarding electronic subscriptions for academic journals . What follows are some personal observations.

The backdrop to this is of course Ireland's five years of austerity and fiscal adjustment. Ireland's university sector is almost entirely state funded and therefore subject to the same regime of cuts imposed throughout the public service. In previous years, university libraries had benefited from increased state investment into research allowing for an expansion of library resources to accommodate the developing 'knowledge economy'. The state's chief funding agency for scientific research, Science Foundation Ireland (SFI), was a major beneficiary of this €20 billion decade long public investment. This national spend on R&D was mostly sustained in the two years following the crash of 2008 but began to fall back from 2010. From 2005 SFI invested €35 million in university libraries to guarantee access to the corpus of research literature. The libraries had formed a consortium, IReL, to negotiate a national licence with academic publishers for all seven Irish universities.

Mise en scène

Part of any librarian's mission is to build collections to support the research activities of their home institutions. The first disruption to this core function arrived when scholarly communications for the sciences were transformed by the huge investment of public funding that arrived into universities after World War II. The accompanying demand to publish would see the development of an almost completely privatised scholarly press. Press baron Robert Maxwell would exploit the potential in German academic publishing at war's end to help establish Pergamon Press (now an Elsevier imprint). Collection development still involved close co-operation with researchers to service the demand for the latest communications but now there was so much more of it. For researchers, progress in an academic career became even more wedded to publication. Thus publish or perish began to stoke the serials crisis.

The second disruption began with the shift from print to digital during the 1990s. Publishers were able to bypass the library and deliver content directly to the desktops of their readers while the academic journal became more fragmented as individual papers could now be electronically transmitted and shared. The journal no longer had to occupy a physical space on the library shelf and collection development became an exercise in negotiating licenced access to remotely held content. Academic publishing had embarked on a frantic period of mergers and acquisitions allowing for the bundling of multiple electronic journals into subscription packages. If this consolidation allowed for an exponential increase in the availability of titles it also saw library spending sky-rocket. While libraries reduced the number of their subscriptions by 6% between 1986 and 1999 they spent 170% more on titles. Bundling effectively killed off the quality control aspect of collections development.

Culling the Big Deal

This is the publishing environment Irish libraries now engage with. By negotiating access licences on a national level, the IReL consortium has allowed smaller Irish university libraries to have a range of electronic resources that match those of the larger institutions. It's a common model that allow libraries to support core research priorities by providing substantial access to the literature. Licences can be negotiated with publishers on a national level through library consortia or by individual institutions. All straightforward enough but what happens when budgets are constrained and cuts to resources on the agenda?

Any decisions around what to cull are complicated by the bundling of multiple titles from a particular publisher. In the same way you can't unravel your cable TV package and choose just the channels you wish to watch, so it is with electronic journal bundling.

The more substantive challenge is how to assess what parts of literature constitute essential resources. The crudest measure would be examine what resources are the most heavily accessed via access logs or download counts. This can be refined by data from services such as the COUNTER initiative which gathers usage statistics on online databases and journals. IReL is a library consortia member. (While I hope that all available efforts were made to gather full data on usage, I wonder why I'm unaware of services such as the SUSHI harvester, developed on common library protocols and COUNTER compliant, being deployed in an Irish context).

Usage statistics should form one part of the picture but assessing quality by journal remains challenging. Despite fragmentation, publishers remain very protective of journal brand identity. The Journal Impact Factor is probably the best known metric used to assess journal quality and has been widely endorsed by publishers. Despite being downplayed by agencies tasked with designing national research assessment exercises such as Deutsche Forschungsgemeinschaft and the " target="_blank">UK's Research Excellence Framework, it is still commonly used as a yardstick of quality.

In 2008, University of California libraries adopted a new strategy for journal value assessment designed by the California Digital Library.

A key aspect of this new methodology is the use of a Weighted Value Algorithm to assess multiple vectors of value for each journal title under review.  Value is assessed in three overall categories:  Utility, Quality, and Cost Effectiveness.  For example, usage statistics contribute to a journal’s Utility score, impact factor contributes to its Quality score, while both cost per use and cost per impact factor contribute to its Cost Effectiveness score.  A composite score is then assigned to each journal to assess its overall value in comparison to other journals in the same broad subject category.  In addition to the weighted value algorithm, many other metrics are compiled and provided to campus librarians by CDL to ensure the richest possible set of information with which to make important selection decisions.

The CDL approach appears to provide a well engineered solution to quality assessment but doesn't mention data relevant to where institutional authors choose to publish. When an Irish library consortium takes the decision to drop a journal subscription should it not also consider if that journal contains contributions from Irish academic authors and if not renewed, how access to those Irish research papers are guaranteed?

Open Access: the third disruption

It is worth remembering that academic publishing is dominated by a handful of multinational enterprises. Academic authors write, review and edit for no direct remuneration. They also compromise their rights as authors through copyright transfer agreements or exclusive licencing arrangements. The libraries in their home institutions buy access to their outputs via journal subscriptions. Most of these activities and the research they underpin are funded through the public purse.

With the shift to digital, the disbinding of the journal has accelerated, affording a rethink of the entire academic publishing process. Scholarly networks can in theory deliver those essential activities of review and dissemination while bypassing the publisher middle man. One obvious benefit would be the removal of tolled-access barriers to impact. Academic publishing can respond to this new reality or find themselves touting a service platform that is surplus to requirements.

Some publishers have begun to reconstitute the journal to allow for open dissemination and licencing in ways which recalibrate or circumvent the subscription based business model. Most have agreed a line of compromise whereby research institutions can collect and openly disseminate a version of the published paper by allowing the deposit of the author's final draft manuscript, post peer-review, into an institutional or subject based repository.

The UK has adopted an even more radical approach. From 1stApril 2013, Research Councils UK will directly fund a proportion of the publications generated through their research grants to be made Open Access in the journal of publication. Many UK research libraries now manage a publication fund as well as an institutional repository. Both approaches (institutional or subject repository deposit and journal-side Open Access) are endorsed by research funder mandates and in some cases institutional publication policies similar to Trinity College Dublin's.

Champions of the current RCUK preference for paid journal-side Open Access over repository deposit can claim that this will eventually lead to the dismantling of the library subscription model and the 'big deal' bundle. Yet even the most optimistic admit that the current UK approach simply supports a 'period of transition'. This is reflected in the policy through support for a hybrid publishing model. Library subscriptions will continue while, subject to the availability of funding, the journal will offer authors a paid option for journal side Open Access. Institutions will pay twice as some publishers transform their journals to a business model sustained by direct publication payments. Critics point out that this 'double dipping' by publishers provides no guarantee that it will affect a universal transition away from the subscription model. UK authors may have access to limited publication funding but their international colleagues and research collaborators may not. The economic evidence suggests that a far more effective way to achieve an Open Access tipping point is to support repository deposit. Either way, the Open Access publication fund is here to stay.  

Irish researchers, particularly those working in the STEM (Science, Technology, Engineering and Medicine) disciplines, will be familiar with author pays, journal side Open Access. In the life sciences, publishers such as PLoS, BioMed Central and Frontiers provide important publishing platforms for Irish research. Those in receipt of research funding from agencies such as the Wellcome Trust will be aware of mission critical Open Access policies that underwrite publication costs as part of the research project spend. For those without publication funding, strong policies supporting repository deposit as a route to Open Access exist across STEM publishing.

Irish research libraries have created a network of institutional repositories reinforced by funding agency policies which require deposit as a research grant condition. While full compliance remains a challenge, authors do have an option that guarantees access to a peer-reviewed version of their published paper even if subscription to the journal of publication is discontinued.

The IReL selection

It is unclear what criteria informed the IReL decision and why titles from one publisher in particular were selected. Taylor and Francis are well known as a big AHSS (Arts, Humanities and Social Sciences) publisher. The dropped subscriptions are largely titles with a STEM focus. This may appear an arbitrary selection but it is not without precedent. In 2011 University of Virginia Library decided not to renew subscriptions for 1,169 Taylor and Francis titles. Many of these journals are found on the IReL list.

Whatever the reasons, I hope IReL respond to the parliamentary question and are transparent about their assessment methods. More importantly, I hope Irish research libraries recognise that future collection development and management must be fully integrated with the existing repository infrastructure. Irish research deserves nothing less.

Further Reading

Derek J. deSolla Price,  _General theory of bibliometric and other cumulative advantage processes_ Journal of the American Society for Information Science 27 (5-6): 292-306 1976. PDF [ ]

Carolyn E. Lipscomb, _Mergers in the publishing industry_ Bulletin of the Medical Library Association 89 (3): 307-308 2001 PubMed Central [ ]

Glenn S. McGuigan, _The Business of Academic Publishing: A Strategic Analysis of the Academic Journal Publishing Industry and its Impact on the Future of Scholarly Publishing_ Electronic Journal of Academic and Special Librarianship 9 (3): 2008 [ ]

Deborah D. Blecic, Stephen E. Wiberley, Joan Fiscella, Sara Bahnmaier-Blaszczak, and Rebecca Lowery _Deal or No Deal? : Evaluating Big Deals and Their Journals_ College & Research Libraries Accepted Manuscript 2011 [ ]

 Jacqueline Wilson, _Journal Value Metrics Assessment_ California Digital Library, 2011 [ ]

_Report of the Research Prioritisation Steering Group_ [ Ireland ], March 2012 [,8958,en.php ]


Implementing the UK Open Access policy: The embargoes for Green


The positive achievement of the UK in positioning Open Access front and center of the debate around the future of academic publishing cannot be denied. However, defining a clear path toward policy implementation has been less successful. Here is the first of five reasons why:

Embargoes for Green.

Anyone who has been tracking the rapid transition from the recommendations of the Finch Group to the emergence of RCUK's policy must admit that the horse-trading around OA embargoes caused considerable confusion. The House of Lords Science and Technology Committee report into the policy published on 22 February 2013 produced this graphic to highlight how it should work.

The thing is it was the first time most people had seen it. Was this the policy tweak we were told would emerge at the end of February? Perhaps. While David Willets' position on Green OA verges on the politically hostile, RCUK and HEFCE have tried to hold a more pragmatic line albeit one which the former appeared at pains to avoid stating plainly. Put simply: although the policy has a preference for Gold, funding is limited and Green will meet the shortfall.

  Year-1 Year-2 Year-3 Year-4 Year-5
RCUK APC fund  £17m £20m To be determined  To be determined  To be determined 
Expected % of papers in Gold OA  45%  53%  60%  67%  75% 

Before the decision tree came to light, the position on embargoes was as follows:

Ideally, a research paper should become Open Access as soon as it is published on-line. However, the Research Councils recognise that embargo periods are currently used by some journals with business models which depend on generating revenue through subscriptions. Therefore, where a publisher does not offer a ‘pay-to-publish’ option the Research Councils will accept a delay between on-line publication and a paper becoming Open Access of no more than six months, except in the case of research papers arising from research funded by the AHRC and the ESRC. Because current funding arrangements make a six month embargo period particularly difficult in the arts, humanities and social sciences, the Research Councils will accept a delay of up to twelve months in the case of research papers arising from research funded wholly or in part by the AHRC and/or the ESRC. However, this is only a transitional arrangement, for a period of five years, and both the AHRC and ESRC are working towards enabling a maximum embargo period of six months for all research papers.

In August 2012, The Publisher's Association released a position statement on RCUK policy which contains our now familiar decision tree. We can assume that in the following six months there was considerable lobbying by the PA to get BIS and RCUK to clarify their position but if you look closely this is a bit more than a simple policy tweak on the time-scales of embargoes. Where a publisher doesn't offer a paid APC option for a particular journal, the author will be compliant with RCUK's OA policy if the author's final draft, post peer-review, is deposited in a repository and released from embargo between 6 to 12 months depending on discipline. Where the publisher DOES offer a paid APC option but there is no money to cover the APC, the embargo gets expanded to 12-24 months.

Therefore publishers can impose an embargo of 12-24 months on the 55% of published research in year one of the RCUK policy. That's quite a roll-back on the original position and to whose benefit?





Subscribe to publishing