Analyzing the Language of Legislation Using Natural Language Processing

Xiao Lu, Moritz Osnabrügge, Gerrit Quaremba

September 2025 Political methodology

Abstract

Leveraging recent advances in natural language processing, this chapter introduces techniques for analyzing legislative texts. It covers regular expressions, dictionary-based methods, supervised and unsupervised text classification methods, part-of-speech (POS) tagging and dependency parsing. To illustrate the usefulness of these techniques, this chapter applies POS tagging and dependency parsing to a corpus of European Union (EU) legislation from the legislative period between 2014 and 2019, which includes 106 directives, 274 regulations and 41 decisions. The chapter analyzes how often EU legislation employs passive voice and negations, which drafting experts and guidelines advise against. Additionally, we explore variations across different types of legislation and procedures and over time. The chapter finds that, on average, 41.0% sentences in a legislative act include passive voice and 9.7% use negations. While both language features do not vary over time, the chapter finds that directives use, on average, more passive voice and negations than regulations and decisions. The analysis also reveals variation across areas. For example, tax-related legislation uses more passive voice than texts on communication networks, whereas legislation on the internal market uses more negations than employment-related legislation.

Type

Journal article

Publication

Routledge

Analyzing the Language of Legislation Using Natural Language Processing

Abstract

Xiao Lu

Assistant Professor