Elsevier's slice of Big Data pie

One of the more illuminating points made during the twitter storm that followed Tuesday's announcement of the Elsevier Mendeley takeover came from [ Heather Piwowar ]. To recap: Elsevier (academic publisher) acquired Mendelay (a reference management platform) for up to £65m. Mendelay is a collaboration tool, with Open Access (publications and data as well as the 'lab notebook' that facilitates scientific discourse and investigation) at its heart. By doing so it  allows researchers to share and annotate papers, build bibliographies and create alternative citation metrics. It also has an institutional edition which can be used to manage research information ('...combining a next generation collaboration platform for the individual researcher with a real time analytics tools for the library' according to the corporate video.) The 'oodles of workflow data' that Elsevier have acquired include data relevant to where researchers are publishing and what they are reading.
Under commented at the time but nonetheless significant in this context was Elsevier's acquisition of  the Danish company Atira in October 2012. Atira are the development team behind PURE, a Current Research Information System (CRIS) with an optional Institutional Repository service. (A CRIS is a research management system drawing on institutional data to create relationships between researchers and research publications, projects, project funding etc.) The platform has been deployed in 19 UK universities primarily to support reporting and submission for the REF2014 exercise. 
Both acquisitions represent a new focus in Elsevier's corporate strategy built around the data generated by institutional research management platforms and those channels of scholarly communication that utilise open content and social media. At its core is a concession that opportunities are emerging for new data driven business models. Cash rich after sustaining year on year profits of around 33% for the past decade, it can well afford to take a punt.
The core of Elsevier's business remains an academic publishing model buttressed through rights acquisition and library subscriptions. Looking past the corporate glad-handing that has followed the Mendeley deal it's worth bearing in mind the ferocious lobbying undertaken by the company to protect this cash cow. Elsevier will continue to talk out of both sides of its mouth about Open Access while it develops a tangible corporate strategy that guarantees sustained profitability for these data driven services. Its well established bibliographic database SciVerse Scopus was already integrated with PURE prior to the acquisition and my feeling is that this integration will be developed further. Currently Scopus data populates PURE with both publication records and citation data which can be delivered as open content through a publicly accessible research portal. The Institutional Repository acts as an optional service associating full-text, when available, with research records (publications, projects, etc).
So far Elsevier has not made PURE a Scopus only CRIS. Why would it? The API to the bibliographic database of its corporate rival Thomson Reuters (as well as PubMed and Arxiv) remains available to PURE. This approach allows the institutional publication dataset to be enhanced by records imported from a variety of sources. Checked for accuracy (by the institutional library) and associated with open research outputs, they are then disseminated via the web. But the enhancement of this content doesn't stop there. 
The Gordian Knot of author disambiguation looks like it might now be cut by the ORCID project. Elsevier is a development partner. Authors register with the service and receive a unique identifier. They can then import their research output data (institutional affiliation, publication and patent references, grant information) and make these data open or closed. One of the real impediments to assessing research impact based on citation metrics is the variation of author names in the literature (John Doe, John F. Doe, Doe, John Frederick, etc). Accurately associating these variations with a specific author will give any commercially available bibliographic database a competitive edge. ORCID's close integration with the Scopus database allows authors to import their publication references provisional on the association of their existing Scopus id with the ORCID unique identifier. In PURE, authors can now include their ORCID UI when creating reference data. If they choose to import publication records from Scopus into PURE, the ORCID UI can be used to make an accurate association between the imported record and specific institutional authors.
This machine driven data cleansing and enhancement relies on platform to platform interoperability. With the acquisition of PURE and Mendeley and the close integration of Scopus into ORCID, Elsevier have developed a locked-in platform specific workflow with real commercial potential. Open in terms of access and re-use. Open on Elsevier's terms.
