Stotles logo
Awarded

UK Government Web Archive production database (GWDB) replacement project

Published

Supplier(s)

Mobilise Cloud Services Ltd

Value

84,850 GBP

Description

Summary of the work The UK Government Web Archive requires a supplier to design, develop and deliver a replacement data management system for key website and crawl data, using AWS cloud based technologies. Data in the existing database must be mapped, merged and migrated into the proposed solution. Expected Contract Length 4-5 months (to include a support period, estimated 4 weeks) Latest start date Friday 1 October 2021 Budget Range Up to £100,000. Suppliers are requested to provide their rate cards with their submissions. Budget based on rates in the region of £750-£995/day (max.) Why the Work is Being Done The National Archives’ Web Archiving team (UKGWA) use a proprietary, legacy SQL database (called GWDB), for the data management of critical website and crawl data for the UK Government Web Archive (https://www.nationalarchives.gov.uk/webarchive/). The system is buggy and difficult to use. GWDB is not flexible and the team have to find workarounds to issues (including holding data in Excel sheets) rather than adapt the system to meet their needs. The existing system runs on old infrastructure that is not well supported, there is a lack of documentation and institutional knowledge about the system. The Web Archiving team need a new flexible data management system that can be supported and maintained by the team to avoid future technical debt issues. Problem to Be Solved We require a multidisciplinary team to develop a replacement database system to manage the website and crawl data. The supplier will need to produce requirements and a data model to support the required processes and reports. The supplier will design, build, test, and implement a cloud based data system that meets the needs of UKGWA. The UKGWA team use the current system to track all websites included in the web archive. Data is sent via XML files to a third party to initiate crawls of the websites. The data is well understood by the UKGWA team (although the data model is not), established processes and data flows must be maintained to ensure continuity of service. The new system must hold most data from the current GWDB system and support many of its functions. The team requires and new system, including a new content model, data quality/validation rules. It must have a user-friendly, intuitive front-end that enables efficient data input and management. A search/reporting facility is required that allows users to query, filter and export data. Reports/results lists must be configurable and exportable to standard formats. The system must be hosted in AWS and will be supported by the UKGWA team. Who Are the Users The primary user of the data management system is the Web Archiving team. They use the system to record the websites and social media channels that need to be archived. The information details if, how and when the websites should be crawled. The team query this data, run reports and export information from the database to help with decision making. XML files are produced from the database via scheduled tasks or when they are triggered manually. These files contain all the key configurations for crawl of the website, they are sent from the system to a third party on a daily basis. The Web Archiving team need to be able to handle user management; add, amend, delete records; query, filter and export data from the database. The system needs to support the team’s workflow (search, view, edit, sign-off). The team supplements the database with data held in Excel files, this creates a fragmented workflow with multiple stages and hand-offs. These datasets and workflows should be incorporated into the new system wherever possible. At the end of the project, the new system must be delivered in such a way that will allow the Web Archiving team to develop it going forward. Work Already Done The team has a good understanding of (1) current workflows that rely on the GWDB data, (2) the specification of the XML output file, (3) the data requirements, (4) many of the functional requirements of the new system. The existing SQL database and reporting services function is hosted in-house. It is neither large, nor very complex. The system has search functionality and Reporting Services that allows the UKGWA team to run reports and export data. A copy of the database is available, but with limited documentation. Coding and security standards - see https://www.gov.uk/service-manual/design/services-for-government-users Existing Team The UK Government Web Archiving Team is a small team of 9 specialists within the Digital Directorate at The National Archives. The team is highly skilled in web archiving and will be the primary source of information during this project. Technologies that can be supported by the Web Archiving Team: • Relational databases - PostgreSQL, MySQL or MariaDB hosted on AWS RDS. • Our primary scripting language is Python. • CI/CD - GitHub and GitHub Actions with Docker images pushed to our Dockerhub organisation. • Cloud infrastructure - AWS. Current Phase Not started Skills & Experience • Strong, demonstrable experience in database design and data modelling. • Must demonstrate excellent competence in building front-end applications to interrogate and view a database. (Describe a recent project. What, where, when, duration, result.) • Must have relevant experience of delivering solutions using AWS services. (Describe a recent project. What, where, when, duration, result.) • Must have experience of deploying a suitably structured team to deliver a database or data management system. (Set out indicative roles and team structure.) • Must have experience of rapid delivery. (Describe when you have delivered a database solution to a short timescale.) • Must demonstrate excellent competence in working in an Agile way to deliver capability in an incremental way (Please can you describe how you have achieved this.) • Must have experience in designing solutions that include generation of output files to a predefined schedule. (Please can you describe how you have achieved this.) • Must demonstrate excellent competence in providing handover training and deployment support. (Describe a recent project. What, where, when, duration, result.) • Must have experience working with subject matter experts. (Describe how you have effectively worked with SMEs.) Nice to Haves • Evidence of guaranteeing the design and build of a database system, where the ongoing support of such is provided by another party • Experience of designing solutions that include an API for accessing data • Ability to provide innovative ideas whilst delivering the core requirements • Demonstrate understanding and ability to deliver digital services/products to the Government Digital Service standards Work Location The National Archives, Kew, Richmond, Surrey TW9 4DU Working Arrangments Flexible Security Clearance Baseline security clearance will be required. Additional T&Cs Relevant National Archives and Civil Service policies and terms and conditions No. of Suppliers to Evaluate 5 Proposal Criteria • Demonstrated understanding of scope of work • Track record of meeting or exceeding requirements • Proven skills in developing data management systems based on examples of previous work • Proven skills in implementing replacement legacy systems, including evaluation of workflows, data requirements, data mapping and migration • Evidence of creative approaches and ability to design interfaces to meet user needs • Capacity to perform work within timescale and budget Cultural Fit Criteria • Have collaborative and flexible working approach, e.g. working with in-house technical and other digital specialists • Approach to supporting teams to adopt new technologies • Examples of delivering transition, knowledge transfer and handover of code • An appreciation for the importance of technical documentation as a means of ensuring ongoing maintainability of systems • Demonstrable commitment to a diverse working environment, with a team comprised of experts from a wide variety of backgrounds Payment Approach Capped time and materials Assessment Method • Case study • Work history • Reference • Presentation Evaluation Weighting Technical competence 50% Cultural fit 15% Price 35% Questions from Suppliers 1. Is it outside or inside ir35? We have checked the requirements (to the best of our knowledge) using the assessment tool found at https://www.gov.uk/guidance/check-employment-status-for-tax and the determination for the role(s) as advertised is that the intermediaries legislation does apply to this engagement. 2. In terms of working arrangements, would primarily remote from the UK with onsite work when required be acceptable? In principle yes, provided you can work core UK business hours. Meetings will be required, and for some periods (such as testing) then we would need someone able to respond to issues/questions as they arise in core business hours. If work was to be off-site in a significantly different time-zone then you will need to provide a plan on how you would manage communications. 3. from reading this advert, we get the impression that the team is predisposed to an AWS solution. For our solution to be effective, skills sets will need to re-learned which can be hurdle that is undesirable for clients.Before we respond, can you confirm if a semantic knowledge graph solution would be an option? We believe that the solution needs to a relational database and not a semantic knowledge graph system. The dataset is relatively small and is highly structured, it does not contain complex many-to-many relationships. The data model is unlikely to change often and we will need to run queries over whole tables. 4. Can you provide an estimate of the number of workflows that will need to be supported in the front-end? The system will need to support searching, adding, amending and deletion of records. We are looking for the supplier to work with us to establish the workflows needed but we would expect there to be a small number. 5. Do you have any high level architectural diagrams of the system that you can share? The current system is poorly documented. Part of the scoping section of the project will be uncovering this type of documentation. 6. How large is the existing database? The current database is approx. 2GB. 7. Can you provide any more details on the functional requirements of the relational database? The system will need to support searching, adding, amending and deletion of records. It will need to produce configurable output files in XML format according to a schedule and on demand. Users must be able to run reports to analyse the data (search/filter/sort functionality) and export these reports. Users need to enter and manage lists of URLs, the system must support a bulk import of this data. 8. Are you storing the raw archived data in the relational database or is it metadata linked to the archived pages in some backing store? The database will store the metadata related to the parameters needed to run a web crawl. Data such as (but not limited to):Domain ID; Parent Domain ID; Domain Type; Government Dep; Domain Type; Domain Name; Homepage URL; Status; Crawl Frequency (months); Scheduled Crawl Date; Exceptional Crawl Date; Crawl Mode; Special Instructions; Archivist Notes; Date added; Closure date; Crawl IDThe system will not store the WARC files (archive data) created by the crawlers. 9. Is there a requirement to develop APIs to feed the public portal? We will need a simple JSON API that allows us to create tools to facilitate internal processes or allow external suppliers to access the data without the need of the current XML output files. 10. Do you need new front-end web portal for public users or are you reusing the existing Public Search facility? The system is not for public users but it will be web-based, accessible via a browser, behind a login screen for only National Archives staff to access. The system is a back-end tool for the Web Archiving team to manage data. 11. Can you provide an example of what you would export from the database for review? An example of one of the XML files we need the system to produce, and an example of a report the current system produces (it is very likely that we will require change to the report but it’s a typical example) can be found here https://tna-ukgwa-sharing.s3.eu-west-2.amazonaws.com/gwdb2project/documents.zip

Timeline

Publish date

3 years ago

Award date

3 years ago

Buyer information

Explore contracts and tenders relating to The National Archives

Go to buyer profile
To save this opportunity, sign up to Stotles for free.
Save in app
  • Looking glass on top of a file iconTender tracking

    Access a feed of government opportunities tailored to you, in one view. Receive email alerts and integrate with your CRM to stay up-to-date.

  • ID card iconProactive prospecting

    Get ahead of competitors by reaching out to key decision-makers within buying organisations directly.

  • Open folder icon360° account briefings

    Create in-depth briefings on buyer organisations based on their historical & upcoming procurement activity.

  • Teamwork iconCollaboration tools

    Streamline sales workflows with team collaboration and communication features, and integrate with your favourite sales tools.

Stop chasing tenders, start getting ahead.

Create your free feed

Explore other contracts published by The National Archives

Explore more open tenders, recent contract awards and upcoming contract expiries published by The National Archives.

Explore more suppliers to The National Archives

Sign up