In 20th Panhellenic Conference on Informatics (PCI’16), Patras Greece, 10 – 12 November 2016
Abstract. Over the last years there has been a great increase on the number of freely available legal resources. Portals that allow users to search for legislation, using keywords are now a common place. However, in the vast majority of those portals, legal documents are not stored in a structured format with a rich set of meta data, but in presentation oriented manifestation, making impossible for the end users to inquiry semantics about the documents, such as date of enactment, date of repeal, jurisdiction, etc. or to reuse information and establish an interconnection with similar repositories. In this paper, we present an approach for extracting a machine readable semantic representation of legislation, from unstructured document formats. Our method exploits common formats of legal documents to identify blocks of structural and semantic information and models them according to a popular legal meta-schema. Our proposed method is highly extensible and achieves high accuracy for a variety of legal and para legal documents, especially legislation. Our evaluation results reveal that our methodology can be of great assistance for the automatic structuring and semantic indexing of legal resources.