Whilst I was working at the University of Sussex, I was involved in a project to implement FigShare. FigShare is a cloud based Research Data Management platform which enables researchers to share their research data with other researchers, and in the same vein, I thought I’d use the work I completed back then, to share an example Architecture Definition Document from the project.
There are a couple of provisos:
- I don’t think that there are any state secrets in here!
- I originally put this together in 2017, things have moved on a bit since then. I have certainly learnt more, and wouldn’t necessarily do it like this again!
This document defines the architectural principles and scope for the figshare Implementation Project as well as all artefacts required to describe the baseline (as-is) and target (to-be) states across all architectural domains, Business, Data, Application and Technology.
This architecture aims to deliver the following scope:
- Implementation of the figshare platform:
- University of Sussex (UoS) custom domain.
- UoS branding.
- Integration with UoS researcher identity:
- Provisioning of users from the Central Database.
- Provisioning of school information from the Central Database.
- Authentication via UoS ADFS platform.
- Integration with DataCite.
- Integration with the Arkivum archive platform.
- Configure UoS Firewall to allow figshare access to Arkivum appliance
The following areas are out of scope for this architecture:
- Integration with current UoS Identity and Access Management platform.
- Integration with Sussex Research Online.
- Integration with ORCiD researcher identity.
- End user communications and training
Goals, Objectives and Constraints
The purpose of this section is to outline the architectural goals, objectives, and constraints for the architecture and this document.
Stakeholders and their Concerns
|Library||The Library are sponsoring the implementation of the figshare platform to support researchers in publishing their data.|
|Research and Enterprise||The Research and Enterprise department support researchers by managing the research process, including final publishing of outcomes and data.|
|ITS||ITS will provide resources for the implementation and ongoing budget for the figshare platform.|
|UoS Researcher||Research data supporting published findings must be made available for the research community.|
In order to comply with modern data sharing practices, the University of Sussex must publish research data alongside the research paper detailing their findings.
The UoS currently has limited capability to publish research data. Although a user can find and read research papers published on Sussex Research Online, a UoS branded implementation of the ePrints application, via the internet, the user must request access to the supporting research data directly from the researcher.
Due to the lack of publishing capabilities currently supported by the UoS, this paper focuses on the changes required to implement the target architecture, rather than analysing the baseline architecture.
The figshare Implementation Project aims to deliver the figshare platform to assist researchers in publishing both a research paper and their supporting research data.
In the future, external users will be able to search both the SRO and figshare repositories for published research papers and research data.
|UoS Researcher||Business Role||Any University of Sussex Student or Employee conducting original research.|
|UoS figshare Administrator||Business Role||Role within the UoS Library Service to administer the figshare platform.|
|General Public||Business Role||Any non-authenticated/anonymous individual|
|Publish Research Data||Business Process||Researchers must publish research data to support findings.|
|Manage public researcher profile||Business Process||Researchers can publicise their skills and research by maintaining a public researcher profile.|
|Manage figshare users and groups||Business Process||The figshare platform must be managed to ensure that research data is categorised in the correct school/group, that storage is used effectively, etc.|
|Find published research data||Business Process||Users wish to find published research data.|
|Find public researcher profile||Business Process||Users wish to be able to read details of the researcher who published research as well as link out to other published research data.|
|[fs] Research Data||Raw data used in a research project that has been uploaded to the figshare platform for publishing. The Figshare platform will generate previews and browsers for research data uploaded in a number of common formats.|
|[fs] User Account||The figshare user responsible for publishing of research data. Information added to a user account become visible on their Research profile. Once a User Account has been provisioned automatically, researcher details must be kept up to date manually by the researcher, e.g. name changes.|
|[AD] User||A user within the UoS Active Directory tree.|
|[AD] UoS Active Directory Tree||Directory of all users (employees, students, etc) within the UoS.|
|ORCiD identifier||The ORCiD (Open Researcher and Contributor ID) is a 16 digit unique identifier for researchers. Any researcher can apply for an ORCID ID, which will stay with them for their career, regardless of institution. Including an ORCiD Identifier enables a link to be generated back to a researcher’s ORCID profile.|
|DataCite DOI||A DataCite DOI (Digital Object Identifier) can be used to uniquely identify and cite a digital artefact such as research data.|
|[fs] Research Data|
|[fs] User Account|
Figshare relies on three data flows:
- Research Users: All UoS researchers (Students and Employees) will have an account created within figshare. This data will be initially sourced from the Central database and automatically loaded into figshare on a nightly basis. Details entered into the user account become public on their Researcher Profile. (Note: user upload does not update name details within figshare, changes to a users’ name must be handled manually by the user).
- Research Data: Once a user has been created, they may log in (via ADFS) and upload their research data. Research data is then managed by the individual.
- Research Data (Archived): Figshare data storage operates on a LRU archival strategy. Should research data meet a maximum time between accesses, the data is archived to Arkivum via an Arkivum appliance at the UoS, and is removed from figshare storage. Once archived, data can be requested from figshare, and is restored via the UoS Arkivum appliance. Data stored in the Arkivum platform is backed up to cloud storage and tape for X years.
|FigShare||Application Component||Figshare is an online digital repository where researchers can preserve and share their research outputs, including figures, datasets, images, and videos. It is free to upload content and free to access, in adherence to the principle of open data. Figshare is a portfolio company of Digital Science, operated by Macmillan Publishers.|
|[fs] Manage Research Data||Application Function||Functionality available to figshare users to manage (Create/Upload, Update, Delete) research data.|
|[fs] Search Research Data||Application Function||Functionality available to the general public to search, preview and download published research data.|
|[fs] Researcher Profile||Application Function||Functionality to view and update a researcher’s profile. A research profile is linked to all data uploaded to figshare.|
|[fs] User and Group Administration||Application Function||Functionality available to a figshare administrator to maintain the users of the system.|
|[fs] User Authentication||Application Function||System functionality to provide user authentication. Includes external identity integration, e.g. ADFS.|
|[fs] Archive Data||Application Function||Automated platform functionality to archive research data off to the Arkivum archival platform.|
|[fs] REST API||Application Interface||Figshare provides a number of web services to interact with the platform. For further detail see http://docs.figshare.com|
|[fs] User Import Service||Application Service||Enables an administrative user to automate the creation of figshare users via an xml/csv file. For further detail see http://docs.figshare.com/#hr_feed|
|[fs] Upload Research||Application Service||Enable a figshare user to upload research data to the platform.|
|Research Data||Artifact||Raw data generated to support a research project.|
|Research Storage Platform (NX2)||Node||Storage for research projects hosted at the University of Sussex.|
|File Storage||Technology Service||Enables a system to perform file based operations on storage (create, read, update, delete)|
|University of Sussex Network||Network||The University of Sussex network is segmented into several VLANs, each with specific purpose, enabling devices to connect appropriately.|
|[ADFS] Authenticate User||Technology Service||Allows users to authenticate with externally hosted platforms using their standard credentials. Enables Single Sign On (SSO) across the internet via the SAML2 protocol.|
|Active Directory Federation Services (ADFS)||System Software||Platform to support user authentication to external platform via standard protocols.|
|Update figshare users||Technology Process||User provisioning process for the figshare platform.|
|misc||Node||Server hosting miscellaneous identity access management processes.|
|Archive||Technology Service||Enables a system to archive files into the Arkivum archive platform.|
|REST||Technology Interface||Representational state transfer (REST) or RESTful web services is a way of providing interoperability between computer systems on the Internet. REST-compliant Web services allow requesting systems to access and manipulate textual representations of Web resources using a uniform and predefined set of stateless operations.|
|Arkivum Appliance||Node||The Arkivum appliance is hosted at the University of Sussex. The appliance includes 50gb of storage. Data copied to the Arkivum appliance will be archived to the Arkivum cloud. Once a minimum threshold has been met, data is also archived to tape.|
|Research Data (To Archive)||Artifact||Raw data generated to support a research project. Stored in the Arkivum Appliance for access or as a staging area before final archival.|
|Arkivum Cloud Storage||Node||Data archived in the Arkivum cloud is archived for long term storage. Once a threshold has been met, it is also stored on tape.|
|Research Data (Archived)||Artifact||Raw data generated to support a research project. Stored in the Arkivum cloud long term archival.|
|Palo Alto Firewall||System Software||The Palo Alto firewall is used by the UoS to meet complex firewall requirements.|
|Internet||Network||Connectivity to the Internet.|
|Web Browser||System Software||Generic web browsing functionality|
|Upload File||Technology Function||Enables a web browser to upload a file from local storage to a web server.|
User provisioning for the figshare implementation project makes use of the figshare HR Feed Service.
- Users can be generated and disabled using the HR Feed Service
- Changes to a user’s name cannot be managed using the HR Feed Service, these changes must be applied manually by the end user.
- Users are not removed by the HR Feed, only deactivated.
|Update figshare users||Process||Execute figshare user selection query and upload result to figshare.|
|Oracle||System Software||Oracle Database (commonly referred to as Oracle RDBMS or simply as Oracle) is an object-relational database management system produced and marketed by Oracle Corporation.|
|Central <<Database>>||Artifact||Primary database for all University of Sussex data. Stores tables for; Staff, Students, Courses, etc.|
|Select figshare Users||Function||Select UPN (firstname.lastname@example.org), First Name, Last Name, Title, Initials, Suffix, Email and School From users in the Central database Where User is A student currently studying at the University of Sussex As a post graduate researcher OR An employee working in a research field|
|Figshare_load.csv||Artifact||Results of “Select figshare Users” SQL exported to CSV format.|
|Figshare HR Upload||Function||Execute “Select figshare Users”, Upload “figshare_load.csv” to figshare HR Feed API|
|misc||Node||Server hosting miscellaneous identity access management processes.|
|Nightly Cron||Technology Event||Nightly process to trigger the “Update figshare Users” process.|
|[fs] HR Feed Service||Application Service||Enables an administrative user to automate the creation of figshare users via an xml/csv file. Further documentation: https://docs.figshare.com/#hr_feed|