Emsal Hukuk Dokümanlarının Otomatik Belirlenmesi

Doğal dil işleme çalışmaları, yapay zekada olduğu gibi veri artışına bağlı olarak hız kazanan alanlardan biridir. Bu çalışmada, ele alınan hukuk dokümanlarının da zaman içerisinde örnekleri artmaktadır. Bir davada emsal olarak gösterilen başka bir davanın tespiti, dava seyrini tamamıyla değiştirmesi nedeniyle oldukça önemlidir. Emsal dava tespitini ele alan bu çalışmada, Ulusal Yargı Ağı Projesi (UYAP) bilgi bankası üzerinden bir veri seti oluşturulmuştur. Davaların incelenmesi ile elde edilen dava şablonları kullanılarak, farklı kısımların girdi ve çıktı sağladığı metinden metin elde edilmesini sağlayan LSTM modeli ile üç farklı sistem oluşturulmuştur. Sistemlerden sağlanan metin çıktıları, farklı BERT modellerinden elde edilen temsil vektörlerinden, FAISS kütüphanesi yardımıyla hızlı bir şekilde test verileri için en yakın dokümanlar elde edilmiştir. 5 farklı dava tipi kategorisindeki test hukuk dokümanlarının, kategori kümelerindeki dokümanlar arasından en benzer 10 dokümanı iki avukat tarafından ayrı ayrı işaretlenmiştir. Sistemlerden elde edilen ve avukatların işaretlediği sonuçlar karşılaştırılmış, benzerlikler örneklerle açıklanarak paylaşılmıştır.

Anahtar Kelimeler:

Emsal davalar, Diziden diziye sinir ağları, Metin benzerliği

Automatic Precedent Legal Document Detection

The natural language processing studies are one of the study fields in artificial intelligence that gain momentum increasing data. The legal documents discussed in this study are also increasing in time. It is very important to show another case which is sentenced as expected called precedent document, as it could completely change the direction of the case. In this study, which deals with the detection of precedent cases, a data set was created via the National Judiciary Informatics System (UYAP) data bank. Using the case templates obtained by studying on the documents, three different systems are created with the sequence to sequence LSTM model, which allows to generate text from the text that different parts provides input and output. After generating the output of the systems,text representation vectors are created using different bert models. The created vectors are used to detect the most similar documents via FAISS library. The test legal documents are selected within 5 different case category clusters. Two lawyers helped us to define the most similar 10 documents within the category clusters of the defined 5 test legal documents. The most similar 10 documents output of the systems are also generated. The results of all systems are shared with the comparison by the annotated results explaining with the examples.

Keywords:

Precedent documents, Sequence to sequence neural networks, Text similarity,

PDF

___

[1] A. H. Tan, “Text mining: The state of the art and the challenges,” Proceedings of the Pakdd 1999 Workshop on Knowledge Discovery from Advanced Databases, 1999, ss. 65-70.
[2] S. Yanatma, Euronews. (2019,11 Şubat) [Çevrimiçi] Erişim: https://tr.euronews.com/2019/05/29/turkiye-de-savciliklara-gelen-dosya-sayisi-son-10-yilda-yuzde-53-artti.
[3] M. Kızrak Ayyüce (2021, 18 Ağustos) [Çevrimiçi] Erişim: https://www.istanbulbarosu.org.tr/files/docs/yapayzekacagindahukuk.pdf
[4] S, Semmler ve R. Zeeve. “Artificial intelligence: Application today and implications tomorrow,” Duke L. & Tech. Rev. c. 16, s. 1/3, ss. 85-99, 2017. [5] Sulea, Octavia-Maria, et al. “Exploring the use of text classification in the legal domain,” arXiv preprint arXiv:1710.09306, 2017.
[6] A. Farzindar, “Atefeh Farzindar and Guy Lapalme,'LetSum, an automatic Legal Text Summarizing system in T. Gordon (ed.), Legal Knowledge and Information Systems. Jurix 2004: The Seventeenth Annual Conference. Amsterdam: IOS Press, 2004, pp. 11-18.” Legal Knowledge and Information Systems: JURIX 2004, the Seventeenth Annual Conference. vol. 120, IOS Press, 2004.
[7] Adalet Bakanlığı, (2021, 03 Eylül) [Çevrimiçi] Erişim: http://emsal.uyap.gov.tr/BilgiBankasiIstemciWeb/
[8] A. Aizawa, “An information-theoretic perspective of tf–idf measures,” Information Processing & Management, c. 39, s. 1, 45-65, 2003.
[9] J. A. Bullinaria ve J. P. Levy. “Extracting semantic representations from word co-occurrence statistics: A computational study,” Behavior research methods, c. 39, s. 3, ss. 510-526, 2007.
[10] K. W. Church, “Word2Vec,” Natural Language Engineering, c. 23, s. 1, ss. 155-162, 2017, doi:10.1017/S1351324916000334. [11] J. Devlin ve diğ. “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
[12] M, E. Peters,, ve diğ. “Deep contextualized word representations,” arXiv preprint arXiv:1802.05365, 2018.
[13] A. Radford, ve diğ. “Language models are unsupervised multitask learners,” 2019.
[14] T. Wolf, ve diğ. “Huggingface's transformers: State-of-the-art natural language processing,” arXiv preprint arXiv:1910.03771, 2019.
[15] Kyunghyun Cho, ve diğ. “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
[16] C. Baziotis, N. Pelekis, ve C.Doulkeridis, “Datastories at semeval-2017 task 4: Deep lstm with attention for message-level and topic-based sentiment analysis,” Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017), 2017.
[17] dbmdz, (2020, 1 Mart). “dbmdz/bert-base-turkish-128k-cased · hugging face,” [Çevrimiçi]. Erişim: https://huggingface.co/dbmdz/bert-base-turkish-128k-cased
[18] distilbert, (2020, 1 Mart). “distilbert-base-nli-stsb-mean-tokens · hugging face,” [Çevrimiçi]. Erişim: https://huggingface.co/sentence-transformers/distilbert-base-nli-stsb-mean-tokens
[19] A. Paszke, ve diğ., “Pytorch: An imperative style, high-performance deep learning library,” Advances in Neural Information Processing Systems, 32, ss. 8026-8037 https://arxiv.org/abs/1912.01703v1, 2019.
[20] J. Johnson, M. Douze, ve H. Jégou. “Billion-scale similarity search with GPUs.” IEEE Transactions on Big Data, 2019.