Stotles logo
Closed

Enriching court judgments and legislation documents, adding hyperlinks and creating Linked Data

Published

Value

150,000 GBP

Description

Summary of the work Turn the references in court judgments to other cases and legislation, into hyperlinks. Enrich the documents, identifying citations and references to other named entities, and extract the enriched data into a knowledge graph. Expected Contract Length 4 months with an option for an extension of up to additional 2 months Latest start date Saturday 1 January 2022 Budget Range Up to £150,000 Why the Work is Being Done The National Archives manages legislation.gov.uk and is developing a new service to provide public access to Court Judgments and Tribunal Decisions. We want to turn the references in these important legal documents into hyperlinks. This to enable users to move seamlessly between judgments, legislation and other official documents such as guidance documents, for example those held in the UK Government Web Archive. The main priority is to enrich Court Judgments, which we are storing in the Legal Document Mark-up Language XML (also called Akoma Ntoso). These documents contain a variety of textual references to other sources, including other cases, legislation and other official documents. We want to turn those textual references into hyperlinks for users, by enriching the data we hold in the Legal Document Mark-up Language. We also want to identify various named entities in the documents to improve our intellectual control over the collection. Once the documents are enriched, we want to extract all the additional information into a Linked Data knowledge graph that can support searching and browsing features in the new public service, and also enable data analysis of the collection. Problem to Be Solved Develop and configure an automated Natural Language Processing (NLP) capability to process the texts of court judgments, to identify references to legislation, cases (both UK and abroad) and significant named entities. The results need to be added to the source documents, creating enriched documents with hyperlinks, and extracted into a Linked Data Knowledge Graph. We prefer hyperlinks to sources managed by The National Archives but will need to link to external sources too. There are 50,000 documents stored in LegalDocML XML to enrich. We will also use the NLP component in a publishing pipeline for new documents. Where there is a public source and a high confidence in the reference, we will create a hyperlink. These should be as specific as the granularity of the reference. The NLP capability should identify a wider set of references than just those that can become hyperlinks. One challenge is managing identifying an initial full reference and subsequent usage through indirect reference (“this Act”), abbreviation or acronym. Another part of the problem is modelling the enriched data having regard to different confidence levels. False positives / erroneous hyperlinks are misleading for users, but may be acceptable in a knowledge graph. Who Are the Users The main users of the whole service are likely to be legal professionals, law students, and academics. We expect users to start their user journey from a web search, using Google or Bing say. By linking the documents together, we can improve the search engine’s ability to rank the documents, and ultimately help improve the search results. Once the user has arrived at the service, we know from user research that they value hyperlinks between documents, as it saves time and aids research. However, the wrong link, or a broken link, is frustrating, creates confusion and undermines confidence. Users have sophisticated research questions. For example, how has a specific provision in legislation been interpreted by the courts?; or, which later judgments build on a particular precedent? Our service can’t directly answer these questions but by creating a knowledge graph of extracted data, we can begin to support a more sophisticated user interface for searching and browsing, so that users can more easily research questions like this for themselves. Early Market Engagement We have regular conversations with legal publishers, who have similar needs to ours, to enrich and link court judgments and legislation. Through those conversations we have developed a good appreciation of what it is possible to do using automated tools, versus manual editorial work. Our experience using GATE (the General Architecture for Text Engineering: https://gate.ac.uk/) with legislation.gov.uk gives us confidence around the feasibility of this work. Work Already Done We have developed and deployed a data enrichment pipeline for legislation documents in legislation.gov.uk using GATE (the General Architecture for Text Engineering: https://gate.ac.uk/). This uses legislation.gov.uk data for the titles of legislation and has rules that identify various types of legislation reference. We think this solution can be extended to support references in court judgments. We have developed a parser that turns court judgments into the Legal Document Mark-up Language and stores the documents in a Marklogic database. We anticipate storing the documents and the knowledge graph together in the database, which will also provide search for end users. Existing Team The supplier’s team will deliver the work. The National Archives team will include a Data Scientist, a Product Manager, a Delivery Manager and a User Researcher. Current Phase Beta Skills & Experience • Experience of natural language processing and enrichment of texts in XML • Experience of modelling linked data • Experience of generating linked data from NLP pipelines • Experience of cloud deployments, in particular AWS • Experience of documenting technical solutions so they can be maintained by others Nice to Haves • Experience of developing pipelines for the General Architecture for Text Engineering • Experience of the Legal Document Mark-up language • Experience of working with legislation documents • Experience of working with court judgments or tribunal decision documents Work Location Mostly remote but some meetings as necessary onsite at The National Archives, Kew, Surrey TW9 4AD. Working Arrangments The supplier will work in accordance with Agile methodologies to scope, plan, and deliver the work incrementally, with daily stand-ups, active communication, and will conduct regular ‘show and tell’ sessions to demonstrate progress. Online meetings will take place via Microsoft Teams with Slack available for quick communication. The National Archives’ staff will be available during UK core hours (10am-4pm) each working day. The supplier will provide their own equipment and technology but will be given access to our organisational tracking app and Slack resources as appropriate. Security Clearance Baseline clearance will be required (BPSS) No. of Suppliers to Evaluate 5 Proposal Criteria • Evidence of delivering natural language processing solutions • Evidence of delivering linked data solutions • Evidence of familiarity with the GDS Service Standard • Team structure, including the relevance of the team members' skills and experience Cultural Fit Criteria • Work in an open and transparent way, sharing work in progress and involving others as you go • Explain what methods you propose to use to engage; communicate, constructively challenge and work effectively with our team and other suppliers • Describe how you propose to support positive working relationships throughout the life of the contract Payment Approach Capped time and materials Assessment Method • Work history • Presentation Evaluation Weighting Technical competence 60% Cultural fit 10% Price 30% Questions from Suppliers 1. May I ask if there is an incumbent supplier? There is no incumbent supplier. 2. Within the 50,000 documents, how many pages there are in the average document? Whilst the documents are not paginated in the target format for data enrichment (LegalDocML XML), in terms of document size, we estimate they are 8-10 A4 pages long on average, when printed. 3. Within the 50,000 documents, what is the content structure predominately: Text, Tables, Pictures/diagrams There is a header portion of the document setting out the main information (neutral citation, date, court, parties, judge/s and representatives). The rest of the document consists largely of headings, sub headings and numbered paragraphs of text. There are some blocks of quoted content, most often from a section of legislation. Occasionally there are tables and images. 4. Within the 50,000 documents, what is the content structure predominately: Text, Tables, Pictures/diagrams We store the documents in LegalDocML, which provides us with a data model for court judgments in XML. 5. Within the 50,000 documents, will there be any hand written text? No. 6. You mention use of AWS, will you be provisioning your own cloud, or would you prefer the supplier to provide a managed cloud service? We expect suppliers to use TNA’s provisioning of AWS cloud services. 7. You mention the requirement to document the technical solution so that it can be maintained by others. Are you intending to provide your own support and ongoing maintenance of the NLP Automations, or would you like the service provide to provide support as a managed service? We would like suppliers to document their solution so that it can be supported and maintained, either in-house, or by a third party under a support contract. 8. What organisations (if any) did you work with to conduct and complete the discovery and alpha work? Work to date has been largely delivered in-house. 9. At what stage in the bidding process will the discovery and alpha outputs be made available to prospective bidders? These will be shared with the appointed supplier post award. 10. Can you provide details of the current technology and approach that the parser uses to turn court judgments into the Legal Document Mark-up Language? The parser is a C# application which has been developed using the Microsoft Office Open XML SDK. The styling information for the whole document is extracted into a <presentation> <style> element block in LegalDocML. Further styling information (font size, weight, decoration etc) is then included in style attributes, as it is needed. <span>s are used for this inline, as in HTML. The body of the document is in the <judgmentBody>, are marked-up using <level>, <paragraph>, <num>, <content> and <p> elements. Specific parts of the document <header> are marked-up with semantic elements such as <neutralCitation>, <docDate>, <judge>, <party> etc. 11. Does the legislation reference solution successfully identify both direct and indirect forms of reference, and to what level of accuracy? Yes, the current legislation reference solution identifies both direct and indirect forms of reference to other pieces of legislation, within and for legislation documents. It does this to a good level of accuracy, in part benefitting from the formal structure the documents provide (of Parts, Chapters, Sections etc). We have not tried the current solution for identifying legislation references in Court Judgments so we do not know how successful it will be. 12. Does the £150,000 budget include all expected costs for both all services and any potential software licences to achieve the desired outcomes? Software licences will be procured separately. Suppliers will need to justify their proposed solution. Our preference is to use open source tools, such as GATE, for data enrichment. We have made the decision to store the documents using Marklogic and we envisage using this for storing the linked data. Hosting is also procured separately. We use AWS. 13. How will the 2 month extension to the initial 4 months work? The optional extension gives The National Archives flexibility if the supplier is able to achieve the outcome but needs longer time.

Timeline

Publish date

2 years ago

Close date

2 years ago

Buyer information

Explore contracts and tenders relating to The National Archives

Go to buyer profile
To save this opportunity, sign up to Stotles for free.
Save in app
  • Looking glass on top of a file iconTender tracking

    Access a feed of government opportunities tailored to you, in one view. Receive email alerts and integrate with your CRM to stay up-to-date.

  • ID card iconProactive prospecting

    Get ahead of competitors by reaching out to key decision-makers within buying organisations directly.

  • Open folder icon360° account briefings

    Create in-depth briefings on buyer organisations based on their historical & upcoming procurement activity.

  • Teamwork iconCollaboration tools

    Streamline sales workflows with team collaboration and communication features, and integrate with your favourite sales tools.

Stop chasing tenders, start getting ahead.

Create your free feed

Explore other contracts published by The National Archives

Explore more open tenders, recent contract awards and upcoming contract expiries published by The National Archives.

Explore more suppliers to The National Archives

Sign up