Prof. Dr. Stephanie Evert

Department Germanistik und Komparatistik
Lehrstuhl für Korpus- und Computerlinguistik

Raum 4.006
Bismarckstraße 6
91054 Erlangen

Telefon: +49 9131 85-22426
Mobil: +491734959785
Faxnummer: +49 9131 85-29251
E-Mail: stephanie.evert@fau.de
Webseite: https://www.stephanie-evert.de/

Sprechzeiten

Jede Woche Mi, 10:00 - 12:00, Raum Bismarckstr. 6, 4.006 / Zoom, nur in der Vorlesungszeit (WS 2025/26), nur mit Voranmeldung

Beiträge in Sammelwerken

Evert, Stephanie, Christine Ganslmayer und Christian Rink. "KI-generierte Wörterbuchartikel bewerten. Ein Beitrag zur Methodik der Wörterbuchkritik." Lexikographisch-grammatische Perspektiven.Tradition, Veränderung und Vielfalt in Lexikographie und Wörterbuchforschung. Hrg. Wiebke Blanck, Rufus H. Gouws, Anja Lobenstein-Reichmann, Berlin/Boston: De Gruyter Brill, 2025.
BibTeX: Download
Adrian, Axel, et al. "Auslegung des KI-VO-E zur Evaluation von Verfahren der Künstlichen Intelligenz am Beispiel der automatischen Anonymisierung von Gerichtsentscheidungen." Sprachmodelle: Juristische Papageien oder mehr? – Tagungsband des 27. Internationalen Rechtsinformatik Symposions IRIS 2024. Hrg. Erich Schweighofer / Stefan Eder / Federico Costantini / Felix Schmautzer / Jonas Pfister, 2024. 205 - 215.
BibTeX: Download
Adrian, Axel, et al. "Automatische Anonymisierung von Gerichtsurteilen – Eine Vision scheint realisierbar." Rechtsinformatik als Methodenwissenschaft des Rechts – Tagungsband des 26. Internationalen Rechtsinformatik Symposions IRIS 2023. Hrg. Erich Schweighofer / Jakob Zanol / Stefan Eder, Editions Weblaw, 2023. 211 - 220.
BibTeX: Download
Adrian, Axel, et al. "Manuelle und automatische Anonymisierung von Urteilen." Digitalisierung von Zivilprozess und Rechtsdurchsetzung. Hrg. Adrian, Axel/Kohlhase, Michael/Evert, Stephanie/Zwickel, Martin, 2022. 173-197.
BibTeX: Download
Dykes, Nathan, Philipp Heinrich, and Stephanie Evert. "Retrieving Twitter argumentation with corpus queries and discourse analysis." Broadening the Spectrum of Corpus Linguistics: New approaches to variability and change. Ed. Susanne Flach, Martin Hilpert, John Benjamins Publishing Company, 2022. 229-256.
DOI: 10.1075/scl.105.08dyk
BibTeX: Download
Keuchen, Michael, et al. "Anonymisierung von Gerichtsurteilen – Eine wesentliche Voraussetzung für E-Justice –." Cybergovernance - Tagungsband des 24. Internationalen Rechtsinformatik Symposions IRIS 2021. Hrg. Schweighofer E, Eder S, Hanke P, Kummer F, Saarenpää A, Editions Weblaw, 2021. 137 - 149.
DOI: 10.38023/8a6f3e93-06e9-4655-84ec-ecf2c55db3e1
BibTeX: Download
Griebel, Tim, Stephanie Evert, and Philipp Heinrich. "Possibilities and Challenges of Corpus-Assisted Discourse Analyses of Austerity in the United Kingdom." Multimodal Approaches to Media Discourses: Reconstructing the Age of Austerity in the United Kingdom. Ed. Griebel T, Evert S, Heinrich P, London: Routledge, 2020. 1 - 10.
DOI: 10.4324/9780367332907-1
BibTeX: Download
Uhrig, Peter, Stephanie Evert, and Thomas Proisl. "Collocation Candidate Extraction from Dependency-Annotated Corpora: Exploring Differences across Parsers and Dependency Annotation Schemes." Lexical Collocation Analysis: Advances and Applications. Ed. Cantos-Gómez P, Almela-Sánchez M, Cham: Springer International Publishing, 2018. 111–140.
DOI: 10.1007/978-3-319-92582-0_6
BibTeX: Download
Evert, Stephanie, and Stella Neumann. "The impact of translation direction on characteristics of translated texts. A multivariate analysis for English and German." Empirical Translation Studies. New Theoretical and Methodological Traditions. Ed. De Sutter G, Lefer M, Delaere I, Berlin: Mouton de Gruyter, 2017. 47-80.
URL: http://www.stefan-evert.de/PUB/EvertNeumann2017/
BibTeX: Download
Diwersy, Sascha, Stephanie Evert, and Stella Neumann. "A weakly supervised multivariate approach to the study of language variation." Aggregating Dialectology, Typology, and Register Analysis. Linguistic Variation in Text and Speech. Ed. Szmrecsanyi B, Wälchli B, Berlin, Boston: De Gruyter, 2014. 174–204.
URL: http://www.degruyter.com/viewbooktoc/product/207699
BibTeX: Download
Bartsch, Sabine, and Stephanie Evert. "Towards a Firthian Notion of Collocation." Vernetzungsstrategien, Zugriffsstrukturen und automatisch ermittelte Angaben in Internetwörterbüchern. Ed. Abel A, Lemnitzer L, Mannheim: Institut für Deutsche Sprache, 2014. 48–61.
BibTeX: Download
Evert, Stephanie. "Tools for the acquisition of lexical combinatorics." Dictionaries. An International Encyclopedia of Lexicography. Supplementary volume: Recent Developments with Focus on Electronic and Computational Lexicography (HSK 5.4). Ed. Gouws RH, Heid U, Schweickard W, Wiegand HE, Berlin, New York: Mouton de Gruyter, 2013. 1415–1432.
BibTeX: Download
Boleda, Gemma, et al. "Adjectives as Saturators vs. Modifiers: Statistical Evidence." Logic, Language and Meaning. Proceedings of the 18th Amsterdam Colloquium. Ed. Aloni M, Kimmelman V, Roelofsen F, Sassoon GW, Schulz K, Westera M, Berlin, Heidelberg: Springer, 2012. 112–121.
DOI: 10.1007/978-3-642-31482-7_12
BibTeX: Download
Ebert, Christian, et al. "Semantik." Computerlinguistik und Sprachtechnologie: Eine Einführung. Hrg. Carstensen K, Ebert C, Ebert C, Jekat S, Klabunde R, Langer H, Heidelberg: Spektrum Akademischer Verlag, 2009. 330-393.
BibTeX: Download
Evert, Stephanie, Bernhard Frötschl, and Wolf Lindstrot. "Statistische Grundlagen." Computerlinguistik und Sprachtechnologie: Eine Einführung. Ed. Carstensen K, Ebert C, Ebert C, Jekat S, Klabunde R, Langer H, Heidelberg: Spektrum Akademischer Verlag, 2009. 114-158.
URL: http://www.cl.uzh.ch/CL/CLBuch/
BibTeX: Download
Evert, Stephanie. "Corpora and collocations." Corpus Linguistics. An International Handbook. Ed. Lüdeling A, Kytö M, Berlin, New York: Mouton de Gruyter, 2008. 1212-1248.
BibTeX: Download
Baroni, Marco, and Stephanie Evert. "Statistical methods for corpus exploitation." Corpus Linguistics. An International Handbook. Ed. Lüdeling A, Kytö M, Berlin, New York: Mouton de Gruyter, 2008. 777-803.
BibTeX: Download
Evert, Stephanie, and Anke Lüdeling. "The emergence of productive non-medical -itis. Corpus evidence and qualitative analysis." Linguistic Evidence. Empirical, Theoretical, and Computational Perspectives. Ed. Kepser S, Reis M, Berlin: Mouton de Gruyter, 2005. 351-370.
URL: http://purl.org/stefan.evert/PUB/LuedelingEvert2005.pdf
BibTeX: Download

Beiträge bei Tagungen

Daunicht, Tina-Myrica, et al. "[Lehren | Lernen] [mit | aus | über] KI: Evaluation des Projekts „Prompt Higher Learning“." Presented at 89. Jahrestagung der Arbeitsgruppe Empirisch-Pädagogische Forschung (AEPF), Essen 2025.
BibTeX: Download
Adrian, Axel, et al. "Führen Unterschiede in den Sprachfassungen der KI-VO zu unterschiedlichen technischen und juristischen Interpretationen? Eine Untersuchung anhand ausgewählter Tatbestandsmerkmale." Tagungsband 28. Internationalen Rechtsinformatik Symposions IRIS, Wien 2025.
BibTeX: Download
Adrian, Axel, et al. "DIREGA – Building Decision Support for German Register Law." Presented at JURIX 2024, Brno Ed. Jaromir Savelka, Jakub Harasta, Tereza Novotna, Jakub Misek, IOS Press, 2024.
DOI: 10.3233/FAIA241269
URL: https://ebooks.iospress.nl/volumearticle/71034
BibTeX: Download
Evert, Stephanie, Christine Ganslmayer, and Christian Rink. "Multi-level analysis as a systematic Approach to evaluating the quality of AI-generated dictionary entries." Proceedings of the EURALEX 2024, Cavtat/Dubrovnik Ed. Kristina Š. Despot, Ana Ostroški Anić, Ivana Brač, 2024. 298–315.
URL: https://euralex.jezik.hr/wp-content/uploads/2021/09/Euralex-XXI-proceedings_1st.pdf
BibTeX: Download
Rink, Christian, Christine Ganslmayer, and Stephanie Evert. "Towards a comprehensive method for evaluating and utilizing AI-generated bilingual lexicographic data in language learning using the example of Chinese as a foreign language." Proceedings of the AsiaLex 2024, Toyo University, Tokyo Ed. Ai Inoue, Naho Kawamoto, Makoto Sumiyoshi, Tokyo: 東洋大学 (Toyo University), 2024. 133–142.
URL: https://www.asialex.org/pdf/Asialex-Proceedings-2024.pdf
BibTeX: Download
Heinrich, Philipp, et al. "Automatic Identification of COVID-19-Related Conspiracy Narratives in German Telegram Channels and Chats." Proceedings of the The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Turin Ed. Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue, 2024. 1932-1943.
URL: https://aclanthology.org/2024.lrec-main.173
BibTeX: Download
Dykes, Nathan, et al. "Leveraging High-Precision Corpus Queries for Text Classification via Large Language Models." Proceedings of the First Workshop on Language-driven Deliberation Technology (DELITE) @ LREC-COLING 2024, Torino, Italy Ed. Hautli-Janisz A, Lapesa G, Anastasiou L, Gold V, Liddo AD, Reed C, Torino, Italy: ELRA and ICCL, 2024. 52--57.
URL: https://aclanthology.org/2024.delite-1.7
BibTeX: Download
Adrian, Axel, et al. "Auslegung des KI-VO-E zur Evaluation von Verfahren der Künstlichen Intelligenz am Beispiel der automatischen Anonymisierung von Gerichtsentscheidungen." Proceedings of the 27. Internationalen Rechtsinformatik Symposions IRIS 2024, Salzburg, Österreich Ed. Erich Schweighofer, Stefan Eder, Federico Costantini, Felix Schmautzer, Jonas Pfister, Salzburg, Austria, 2024. 205 -- 215.
BibTeX: Download
Heinrich, Philipp, et al. "Automatic Identification of COVID-19-related Narratives in German Telegram Channels and Chats." Proceedings of the Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024, Torino Ed. Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue, European Language Resources Association (ELRA), 2024. 1932-1943.
BibTeX: Download
Dykes, Nathan, et al. "Finding Argument Fragments on Social Media with Corpus Queries and LLMs." Proceedings of the 1st International Conference on Robust Argumentation Machines, RATIO 2024, Bielefeld, DEU Ed. Philipp Cimiano, Anette Frank, Michael Kohlhase, Benno Stein, Cham: Springer Science and Business Media Deutschland GmbH, 2024. 163-181.
DOI: 10.1007/978-3-031-63536-6_10
BibTeX: Download
Heinrich, Philipp, and Stephanie Evert. "Operationalising the Hermeneutic Grouping Process in Corpus-assisted Discourse Studies." Proceedings of the 4th Workshop on Computational Linguistics for the Political and Social Sciences, CPSS 2024, Vienna, AUT Ed. Christopher Klamm, Gabriella Lapesa, Gabriella Lapesa, Simone Paolo Ponzetto, Ines Rehbein, Indira Sen, Association for Computational Linguistics (ACL), 2024. 33-44.
BibTeX: Download
Blombach, Andreas, et al. "Exploring Lexical Diversities." Proceedings of the Digital Humanities 2022, Tokyo 2022. 130-134.
URL: https://dh2022.dhii.asia/dh2022bookofabsts.pdf
BibTeX: Download
Diwersy, Sascha, et al. "Eine korpuslinguistische Analyse der Corona-Berichterstattung in der deutschen und französischen Presse." Tagungsband Mots et Discours de la Pandémie, Heidelberg 2022.
BibTeX: Download
Tayebi Arasteh, Soroosh, et al. "How Will Your Tweet Be Received? Predicting the Sentiment Polarity of Tweet Replies." Proceedings of the IEEE 15th International Conference on Semantic Computing (ICSC), Laguna Hills, CA Ed. IEEE, 2021. 370-373.
DOI: 10.1109/ICSC50631.2021.00068
URL: https://ieeexplore.ieee.org/document/9364527
BibTeX: Download
Evert, Stephanie, and Gabriella Lapesa. "FAST: A carefully sampled and cognitively motivated dataset for distributional semantic evaluation." Proceedings of the 25th Conference on Computational Natural Language Learning, CoNLL 2021, Virtual, Online Ed. Arianna Bisazza, Omri Abend, Association for Computational Linguistics (ACL), 2021. 588-595.
BibTeX: Download
Blombach, Andreas, et al. "A new German Reddit corpus." Proceedings of the 15th Conference on Natural Language Processing, KONVENS 2019, Erlangen-Nurnberg German Society for Computational Linguistics and Language Technology, 2020. 278-279.
BibTeX: Download
Evert, Stephanie, et al. "Corpus query lingua franca part II: Ontology." Proceedings of the 12th International Conference on Language Resources and Evaluation, LREC 2020, Marseille Ed. Nicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, European Language Resources Association (ELRA), 2020. 3346-3352.
BibTeX: Download
Proisl, Thomas, et al. "EmpiriST Corpus 2.0: Adding Manual Normalization, Lemmatization and Semantic Tagging to a German Web and CMC Corpus." Proceedings of the 12th International Conference on Language Resources and Evaluation, LREC 2020, Marseille Ed. Nicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, European Language Resources Association (ELRA), 2020. 6142-6148.
URL: https://www.aclweb.org/anthology/2020.lrec-1.754
BibTeX: Download
Dykes, Nathan, Philipp Heinrich, and Stephanie Evert. "Arguing Brexit on Twitter. A corpus linguistic study." Presented at European Conference on Argumentation 2019, Groningen 2019.
BibTeX: Download
Dykes, Nathan, Philipp Heinrich, and Stephanie Evert. "Reconstructing Twitter arguments with corpus linguistics." Presented at ICAME40: Language in Time, Time in Language, Neuchâtel 2019.
BibTeX: Download
Proisl, Thomas, et al. "EmotiKLUE at IEST 2018: Topic-Informed Classification of Implicit Emotions." Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Brüssel Ed. Balahur A, Mohammad SM, Hoste V, Klinger R, Brussels: Association for Computational Linguistics, 2018. 235–242.
DOI: 10.18653/v1/w18-6234
URL: http://aclweb.org/anthology/W18-6234
BibTeX: Download
Heinrich, Philipp, et al. "A Transnational Analysis of News and Tweets about Nuclear Phase-Out in the Aftermath of the Fukushima Incident." Proceedings of the Workshop on Computational Impact Detection from Text Data, Miyazaki Ed. Andreas Witt, Jana Diesner, Georg Rehm, Paris: ELRA, 2018. 8 - 16.
BibTeX: Download
Proisl, Thomas, et al. "Delta vs. N-Gram Tracing: Evaluating the Robustness of Authorship Attribution Methods." Proceedings of the 11th Language Resources and Evaluation Conference, Miyazaki Ed. Calzolari N, Choukri K, Cieri C, Declerck T, Goggi S, Hasida K, Isahara H, Maegaard B, Mariani J, Mazo H, Moreno A, Odijk J, Piperidis S, Tokunaga T, Miyazaki: European Language Resources Association, 2018. 3309–3314.
URL: http://www.lrec-conf.org/proceedings/lrec2018/pdf/835.pdf
BibTeX: Download
Evert, Stephanie, Nathan Dykes, and Joachim Peters. "A quantitative evaluation of keyword measures for corpus-based discourse analysis." 2018.
URL: http://www.stefan-evert.de/PUB/EvertEtc2018_CAD_slides.pdf
BibTeX: Download
Evert, Stephanie, et al. "Combining Machine Learning and Semantic Features in the Classification of Corporate Disclosures." Proceedings of the Logic and Algorithms in Computational Linguistics 2017 (LACompLing2017), Stockholm Ed. Loukanova R, Liefke K, Stockholm: Stockholm University, 2017. 47 - 62.
URL: http://su.diva-portal.org/smash/get/diva2:1140018/FULLTEXT03.pdf
BibTeX: Download
Proisl, Thomas, et al. "Translation Inference across Dictionaries via a Combination of Graph-based Methods and Co-occurrence Statistics." Proceedings of the Shared Task on Translation Inference Across Dictionaries, Galway Ed. McCrae J, Bond F, Buitelaar P, Cimiano P, Declerck T, Gracia J, Kernerman I, Ponsoda E, Ordan N, Piasecki M, CEUR, 2017. 94–102.
URL: http://ceur-ws.org/Vol-1899/TIAD17_paper_1.pdf
BibTeX: Download
Evert, Stephanie, et al. "E-VIEW-Alation – a Large-Scale Evaluation Study of Association Measures for Collocation Identification." Proceedings of the eLex 2017, Leiden Ed. Iztok K, Carole T, Miloš J, Jelena K, Simon K, and Vít B, Brno: Lexical Computing, 2017. 531–549.
URL: https://elex.link/elex2017/wp-content/uploads/2017/09/paper32.pdf
BibTeX: Download
Lapesa, Gabriella, and Stephanie Evert. "Large-scale evaluation of dependency-based DSMs: Are they worth the effort?" Proceedings of the Proceedings of the 15th Annual Meeting of the European Association for Computational Linguistics (EACL 2017): Volume 2, Short Papers Valencia, Spain, 2017. 394-400.
URL: http://www.linguistik.fau.de/dsmeval/
BibTeX: Download
Evert, Stephanie, Sebastian Wankerl, and Elmar Nöth. "Reliable measures of syntactic and lexical complexity: The case of Iris Murdoch." Presented at Proceedings of the Corpus Linguistics 2017 Conference, Birmingham Birmingham, UK, 2017.
URL: http://purl.org/stefan.evert/PUB/EvertWankerlNoeth2017.pdf
BibTeX: Download
Wankerl, Sebastian, Elmar Nöth, and Stephanie Evert. "An Analysis of Perplexity to Reveal the Effects of Alzheimer's Disease on Language." Proceedings of the ITG-Fachbericht 267: Speech Communication Paderborn, Germany, 2016. 254-259.
BibTeX: Download
Evert, Stephanie. "CogALex-V Shared Task: Mach5 – A traditional DSM approach to semantic relatedness." Proceedings of the Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex-V) Osaka, Japan, 2016. 92-97.
URL: http://www.collocations.de/data/#mach5
BibTeX: Download
Evert, Stephanie, et al. "„Delta“ in der stilometrischen Autorschaftsattribution." Präsentiert bei DHd 2016, Leipzig Leipzig: Nisaba, 2016.
URL: http://www.dhd2016.de/abstracts/sektionen-002.html
BibTeX: Download
Evert, Stephanie, et al. "EmpiriST 2015: A Shared Task on the Automatic Linguistic Annotation of Computer-Mediated Communication and Web Corpora." Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), Berlin Berlin, Germany, 2016. 44-56.
URL: https://sites.google.com/site/empirist2015/
BibTeX: Download
Santus, Enrico, et al. "The CogALex-V Shared Task on the Corpus-Based Identification of Semantic Relations." Proceedings of the Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex-V) Osaka, Japan, 2016. 69-79.
URL: https://sites.google.com/site/cogalex2016/home/shared-task
BibTeX: Download
Plotnikova, Nataliia, et al. "KLUEless: Polarity Classification and Association." Proceedings of the Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015) Denver, Colorado, 2015. 619--625.
URL: http://www.aclweb.org/anthology/S15-2103
BibTeX: Download
Plotnikova, Nataliia, et al. "SemantiKLUE: Semantic Textual Similarity with Maximum Weight Matching." Proceedings of the Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015) Denver, Colorado, 2015. 111--116.
URL: http://www.aclweb.org/anthology/S15-2020
BibTeX: Download
Evert, Stephanie, and Antti Arppe. "Some theoretical and experimental observations on naïve discriminative learning." Proceedings of the Proceedings of the 6th Conference on Quantitative Investigations in Theoretical Linguistics (QITL-6) Tübingen, Germany, 2015.
BibTeX: Download
Evert, Stephanie, et al. "Towards a better understanding of Burrows's Delta in literary authorship attribution." Proceedings of the Proceedings of the Fourth Workshop on Computational Linguistics for Literature Denver, CO, 2015. 79--88.
URL: http://www.aclweb.org/anthology/W15-0709
BibTeX: Download
Evert, Stephanie, and Andrew Hardie. "Ziggurat: A new data model and indexing format for large annotated text corpora." Proceedings of the Proceedings of the 3rd Workshop on the Challenges in the Management of Large Corpora (CMLC-3) Lancaster, UK, 2015. 21--27.
BibTeX: Download
Lapesa, Gabriella, Stephanie Evert, and Sabine Schulte im Walde. "Contrasting Syntagmatic and Paradigmatic Relations: Insights from Distributional Semantic Models." Proceedings of the Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014) Dublin, Ireland, 2014. 160–170.
URL: http://www.aclweb.org/anthology/S14-1020
BibTeX: Download
Evert, Stephanie. "Distributional Semantics in R with the wordspace Package." Proceedings of the Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations Dublin, Ireland, 2014. 110–114.
URL: http://wordspace.r-forge.r-project.org
BibTeX: Download
Lapesa, Gabriella, and Stephanie Evert. "NaDiR: Naive Distributional Response Generation." Proceedings of the Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex) Dublin, Ireland, 2014. 50–59.
URL: http://www.aclweb.org/anthology/W14-4707
BibTeX: Download
Proisl, Thomas, et al. "SemantiKLUE: Robust semantic similarity at multiple levels using maximum weight matching." Proceedings of the Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval-2014) Dublin, Ireland, 2014. 532–540.
URL: http://www.aclweb.org/anthology/S14-2093
BibTeX: Download
Evert, Stephanie, et al. "SentiKLUE: Updating a polarity classifier in 48 hours." Proceedings of the Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval-2014) Dublin, Ireland, 2014. 551–555.
URL: http://www.aclweb.org/anthology/S14-2096
BibTeX: Download
Schulze Wettendorf, Clemens, et al. "SNAP: A Multi-Stage XML-Pipeline for Aspect Based Sentiment Analysis." Proceedings of the Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014) Dublin, Ireland, 2014. 578-584.
URL: http://www.aclweb.org/anthology/S14-2101
BibTeX: Download
Lapesa, Gabriella, and Stephanie Evert. "Evaluating Neighbor Rank and Distance Measures as Predictors of Semantic Priming." Proceedings of the Proceedings of the ACL Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2013) Sofia, Bulgaria, 2013. 66--74.
BibTeX: Download
Greiner, Paul, et al. "KLUE-CORE: A regression model of semantic textual similarity." Proceedings of the Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity Atlanta, Georgia, USA: Association for Computational Linguistics, 2013. 181–186.
URL: http://aclweb.org/anthology/S13-1026
BibTeX: Download
Proisl, Thomas, et al. "KLUE: Simple and robust methods for polarity classification." Proceedings of the Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) Atlanta, GA: Association for Computational Linguistics, 2013. 395–401.
URL: http://aclweb.org/anthology/S13-2065
BibTeX: Download
Ebert, Cornelia, Stephanie Evert, and Katharina Wilmes. "Focus Marking via Gestures." Proceedings of the Proceedings of Sinn & Bedeutung 15 Ed. Reich I, Saarbrücken, Germany: Universaar – Saarland University Press, 2011.
BibTeX: Download
Evert, Stephanie, and Andrew Hardie. "Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium." Proceedings of the Proceedings of the Corpus Linguistics 2011 Conference Birmingham, UK, 2011.
BibTeX: Download
Evert, Stephanie. "Google Web 1T5 N-Grams Made Easy (but not for the computer)." Proceedings of the Proceedings of the 6th Web as Corpus Workshop (WAC-6) Los Angeles, CA, 2010. 32–40.
BibTeX: Download
Giesbrecht, Eugenie, and Stephanie Evert. "Part-of-Speech Tagging – A Solved Task? An evaluation of POS taggers for the Web as corpus." Proceedings of the Proceedings of the 5th Web as Corpus Workshop (WAC5) Ed. Alegria I, Leturia I, Sharoff S, San Sebastian, Spain, 2009. 27-35.
URL: http://purl.org/stefan.evert/PUB/GiesbrechtEvert2009_Tagging.pdf
BibTeX: Download
Evert, Stephanie. "A lightweight and efficient tool for cleaning Web pages." Proceedings of the Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008) Marrakech, Morocco, 2008.
URL: http://purl.org/stefan.evert/PUB/Evert2008_NCleaner.pdf
BibTeX: Download
Baroni, Marco, and Stephanie Evert. "Words and Echoes: Assessing and Mitigating the Non-Randomness Problem in Word Frequency Distribution Modeling." Proceedings of the Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Prague, Czech Republic, 2007. 904-911.
BibTeX: Download
Evert, Stephanie, and Marco Baroni. "zipfR: Word Frequency Distributions in R." Proceedings of the Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Posters and Demonstrations Sessions Prague, Czech Republic, 2007. 29-32.
BibTeX: Download
Evert, Stephanie. "A Simple LNRE Model for Random Character Sequences." Proceedings of the Proceedings of the 7èmes Journées Internationales d'Analyse Statistique des Données Textuelles (JADT 2004) Louvain-la-Neuve, Belgium, 2004. 411-422.
URL: http://purl.org/stefan.evert/PUB/Evert2004a.pdf
BibTeX: Download
Krenn, Brigitte, Stephanie Evert, and Heike Zinsmeister. "Determining Intercoder Agreement for a Collocation Identification Task." Proceedings of the Proceedings of KONVENS 2004 Vienna, Austria, 2004. 89-96.
URL: http://purl.org/stefan.evert/PUB/KrennEvertZinsmeister2004.pdf
BibTeX: Download
Evert, Stephanie. "Significance tests for the evaluation of ranking methods." Proceedings of the Proceedings of the 20th International Conference on Computational Linguistics (Coling 2004) Geneva, Switzerland, 2004. 945-951.
BibTeX: Download
Heid, Ulrich, et al. "A data collection for semi-automatic corpus-based updating of dictionaries." Proceedings of the Proceedings of the 9th EURALEX International Congress Ed. Heid U, Evert S, Lehmann E, Rohrer C, Stuttgart, Germany, 2000. 183--195.
BibTeX: Download
Evert, Stephanie, Ulrich Heid, and Wolfgang Lezius. "Methoden zum Vergleich von Signifikanzmaßen zur Kollokationsidentifikation." Proceedings of the KONVENS-2000 Sprachkommunikation Ed. Zühlke W, Schukat-Talamazzini EG, Ilmenau, Germany: VDE-Verlag, 2000. 215--220.
BibTeX: Download
Evert, Stephanie, Ulrich Heid, and Anke Lüdeling. "On Measuring Morphological Productivity." Proceedings of the KONVENS-2000 Sprachkommunikation Ed. Zühlke W, Schukat-Talamazzini EG, Ilmenau, Germany: VDE-Verlag, 2000. 57--61.
BibTeX: Download
Evert, Stephanie, Ulrich Heid, and Steve Berman. "Searchable Metaspaces." Proceedings of the Proceedings of the EAGLES/ISLE Workshop on Metadata Athens, Greece, 2000.
BibTeX: Download

Abschlussarbeiten

Evert, Stephanie. The Statistics of Word Cooccurrences: Word Pairs and Collocations. Dissertation, 2004.
URL: http://www.collocations.de/phd.html
BibTeX: Download

weitere siehe CRIS

Prompt Higher Learning – Mit KI-gestützten Writing Tools (Hochschul-)Bildung verbessern?!

(Drittmittelfinanzierte Einzelförderung)

Laufzeit: 1. April 2024 - 31. März 2026
Mittelgeber: Stiftung Innovation in der Hochschullehre

→Mehr Informationen
Digitaler Registerassistent

(Drittmittelfinanzierte Einzelförderung)

Laufzeit: 1. März 2024 - 28. Februar 2027
Mittelgeber: andere Förderorganisation
URL: https://www.direga.fau.de

→Mehr Informationen
Reading concordances in the 21st century (RC21)

(Drittmittelfinanzierte Einzelförderung)

Laufzeit: 1. März 2023 - 31. März 2026
Mittelgeber: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)

Abstract

In today's digital world, the amount of text communicated in electronic form is ever-increasing and there is a growing need for approaches and methods to extract meanings from texts at scale. Corpus linguists have long been studying digitised texts and have established that much of language is characterised by recurring patterns. So the word 'eye' can appear together with words like 'cream' and 'test', or words like 'closed' and 'fixed'. In corpus linguistics, such patterns are identified with the help of concordances, i.e. displays that show many occurrences of a word, phrase or construction across a range of contexts in a compact format. However, lacking a well-established and clear-cut methodology, the art of reading concordances has not yet realised its full potential. At the same time, there has been very little innovation in algorithms in the concordance software packages available to corpus linguists.This project proposes an innovative approach to reading concordances in the 21st century. Through the collaboration between the University of Birmingham and FAU Erlangen-Nürnberg we combine strengths in theoretical work in corpus linguistics with expertise in computational algorithms in order to develop a systematic methodology for reading concordances. We will develop tool-independent strategies and corresponding algorithms for the semi-automatic organisation of concordance lines, and implement them in the software FlexiConc. To develop and test our approach, we will conduct two case studies. The first will focus on body language in fiction compared to non-fiction texts. The second will focus on political argumentation in social media, formalising its findings as corpus queries that can be used for automatic argumentation mining. Both case studies include a comparative dimension between English and German. Hence, they broaden out approaches to concordance reading which have been very focused on the English language so far. Through these case studies, we will establish an approach that not only provides innovation in corpus linguistics, but also has wider implications for the analysis of textual data at scale, while still retaining a humanities perspective.We will develop FlexiConc as open-source software, so that other researchers can use it as an off-the-shelf tool or integrate it into existing concordance tools or their own software environment. Both FlexiConc and our tool-independent approach to concordance analysis will have relevance beyond corpus linguistics, providing innovative approaches and algorithms for disciplines such as digital humanities and computational social science. We will raise awareness of the new possibilities in a variety of forms, for instance, through a project blog where users of our software can share their experience, and with the help of an advisory board of leading international experts. We will run training sessions at summer schools and conferences and make educational materials available online.

→Mehr Informationen
Automatische Anonymisierung von Gerichtsentscheidungen für E-Justice und Legal-Tech

(Drittmittelfinanzierte Gruppenförderung – Gesamtprojekt)

Laufzeit: 15. Dezember 2022 - 14. Dezember 2025
Mittelgeber: Bundesministerium für Forschung, Technologie und Raumfahrt (BMFTR)
URL: https://www.linguistik.phil.fau.de/projects/leak-anger/

→Mehr Informationen
GRK 2839: Die Konstruktionsgrammatische Galaxis

(Drittmittelfinanzierte Gruppenförderung – Gesamtprojekt)

Laufzeit: 1. Oktober 2022 - 30. September 2027
Mittelgeber: DFG / Graduiertenkolleg (GRK)
URL: https://www.linguistics.phil.fau.eu/fau-linguistics/research-training-group-dimensions-of-constructional-space/

Abstract

Gegenstand des beantragten Graduiertenkollegs ist ein relativ neues Paradigma der Linguistik, nämlich die Konstruktionsgrammatik (CxG). Grundlegende Prämisse ist dabei, dass das gesamte sprachliche Wissen einer Person in einem Netzwerk von Form-Bedeutungs-Paaren, sog. Konstruktionen, repräsentiert ist. Konstruktionen unterscheiden sich hinsichtlich Größe (von Morphemen über Argumentstruktur bis hin zu Diskurskonventionen), Abstraktionsgrad (vollständig konkret/elaboriert, teilweise schematisch, vollständig schematisch), Verankerung im mentalen Konstruktikon sowie Art ihrer Verbindung zu anderen Konstruktionen im Netzwerk. Diese Eigenschaften bilden einen mehrdimensionalen Raum, den wir als „constructional space“ bezeichnen. Das GRK befasst sich mit Kernfragen der CxG (z.B. Erkennung von Konstruktionen, konstruktionelle Vernetzung) sowie mit ihrer Anwendung auf verschiedene Sprachen (u.a. Arabisch und Haitianisches Kreol), Sprachstufen und Sprachkontaktsituationen. Ferner sollen zentrale CxG-Hypothesen mit verschiedensten Methoden überprüft werden (u.a. traditionelle Analysen, „Big Data“-Korpusmethoden, Verhaltensexperimente und Neuroimaging). Ein zentrales Ziel ist die Erstellung eines Forschungskonstruktikons (FK), das Beschreibungen der untersuchten Konstruktionen in einer Datenbank zusammenfasst und mit allen zugehörigen Forschungsergebnissen der Teilprojekte verknüpft. Insofern bildet das frei zugängliche FK eine ideale Basis und ein Modell für weitere interdisziplinäre Forschung weltweit. Das GRK ist hochgradig interdisziplinär angelegt und kombiniert Erkenntnisse aus der theoretischen Linguistik, der Computerlinguistik, den Neurowissenschaften und der Psycholinguistik. Verbindende Element sind der gemeinsame theoretische Rahmen der CxG, die allerdings durchaus kritisch hinterfragt werden soll, gemeinsame Forschungsfragen und eine konsequente Ausrichtung auf empirische Forschung. Zentraler Bestandteil des GRK ist ein strukturiertes Ausbildungsprogramm für die Promovierenden, das neben einer Winterschule, drei Bootcamps und regelmäßigen Seminaren und Forschungsgruppentreffen auch zahlreiche Wahlveranstaltungen, optionale Auslandsaufenthalte und individuelles Coaching umfasst. Das GRK wirbt hochqualifizierte Promovierende an und vermittelt fundierte Grundlagen in linguistischer Theoriebildung und empirischen Forschungsmethoden. Alle Promovierenden haben jeweils zwei Betreuungspersonen, die unterschiedliche Disziplinen oder methodische Ansätze vertreten, und werden von weiteren Personen beraten (u.a. internationalen advisors). Sie sammeln so Erfahrungen im Bereich interdisziplinärer Kooperationen, bauen ein Netzwerk internationaler Kontakte auf und erwerben umfangreiche transfer skills. Insgesamt erwerben die Promovierenden im GRK eine Vielzahl fachlicher und allgemeiner Kompetenzen, die essentielle Qualifikationen für eine weitere Karriere in der Wissenschaft oder anderen Bereichen der Gesellschaft darstellen.

→Mehr Informationen
Die Normalisierung rechtspopulistischer und neurechter Diskurse in Japan und Deutschland

(Drittmittelfinanzierte Einzelförderung)

Laufzeit: 1. Oktober 2022 - 30. September 2025
Mittelgeber: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)

→Mehr Informationen
Multimodal Constructional Space

(Drittmittelfinanzierte Gruppenförderung – Teilprojekt)

Titel des Gesamtprojektes: GRK 2839: Die Konstruktionsgrammatische Galaxis
Laufzeit: 1. Oktober 2022 - 30. September 2027
Mittelgeber: DFG / Graduiertenkolleg (GRK)
URL: https://www.cxg.phil.fau.eu/about-the-rtg/about-the-rtg-projects/project-2/

Abstract

When it comes to human face-to-face communication, speakers make use of various modalities to deliver and interpret messages. This complex operation involves not only the verbal exchange of linguistic forms, but also the use of facial expressions, gestures, and prosody. In fact, a number of studies have shown that many gestures and linguistic forms systematically co-occur with one another (Cienki, 2015; Ningelgen & Auer, 2017; Ziem, 2017; Zima, 2017b). Following a cognitive/usage-based model, we know that language learners and users keep track of usage events, and that knowledge of language is constantly shaped and re-shaped with each instance of use (Bybee, 2010). One of the challenges of linguistic theory is therefore to account for these multimodal phenomena.The main focus of this project is on modeling multimodality in a Construction Grammar (CxG) framework (Goldberg, 1995, 2006, 2019). Over the past few years, there have been several proposals that address these types of phenomena (Cienki, 2017; Herbst, 2020; Hoffmann, 2017; Mittelberg, 2017; Schoonjans, 2017; Turner, 2018; 2020a; 2020b; Uhrig, 2021; Ziem, 2017; Zima, 2017; Zima and Bergs 2017). Still, there are various theoretical and practical aspects yet to be addressed. Some of the discussion points that were brought up concern the theoretical status of multimodal constructions, whether the constructicon is multimodal, and whether a Multimodal Construction Grammar is needed.In detail, this project is set to investigate three kinds of constructions where multimodal phenomena are observed, and use them as case studies to understand and suggest ways of modeling multimodality, following a CxG framework. These are cases whereby a linguistic form is observed to systematically and frequently co-occur with a gesture as in “I came this close to 🤏 winning the lottery”; a gesture that seemingly has a syntactic role in the utterance; and a gesture where no association is found with particular linguistic forms, but rather is a case of free combination such as air quotes (Uhrig, 2020). All of this will be done in the form of corpus-based studies. The data will be extracted from the large repository of audio-visual data NewsScape English Corpus, and analyzed using a variety of tools such as CQPweb, Red Hen Rapid Annotator, Elan, and Praat (Uhrig, 2021).By the end of this project, we would like to suggest ways of delineating and modeling multimodal constructions in a CxG framework, account for the type of information that should be considered when describing constructions, which is not a trivial matter, and apply data science methods to multimodal communication research to identify and extract gestures from multimodal corpora, and to use statistical methods to analyze them.

→Mehr Informationen
Corpus Evidence for Delineating Constructions

(Drittmittelfinanzierte Gruppenförderung – Teilprojekt)

Titel des Gesamtprojektes: GRK 2839: Die Konstruktionsgrammatische Galaxis
Laufzeit: 1. Oktober 2022 - 30. September 2027
Mittelgeber: DFG / Graduiertenkolleg (GRK)

Abstract

CxG and many other usage-based approaches agree that language consists of pre-fabricated form-meaning pairings of varying sizes (e.g. Goldberg 1995, Hunston & Francis 2000, Sinclair & Mauranen 2006, Wray 2008), which are called constructions in CxG. In contrast to approaches that understand language as a probabilistic system, such as lexical priming theory (Hoey 2005) or the EC-Model (Schmid 2020), constructions are usually conceptualised as discrete symbolic units or the “nodes of a symbolic network” (Diessel 2019: 249), possibly emerging from the generalisation of associational patterns or clusters of memory traces (e.g. Goldberg 2019). Prior research is typically focused on extensive linguistic analysis and discussion of a relatively small set of specific constructions (such as the English ditransitive or the let alone construction). Such studies have not been able to establish clear-cut criteria and diagnostics for determining at scale, i.e. with broad coverage, which form-meaning pairings should be considered as constructions and which elements (lexical items, restricted or open slots, and grammatical features) should be included in a given construction. While it is evident in a usage-based approach that there can be no dichotomic distinction of constructions vs. non-constructions4 and that “constructionhood” is a matter of degree, binary decisions on an inventory of constructions still have to be made for the purposes of linguistic analysis and the systematic compilation of a broadcoverage reference constructicon.First efforts to build such a reference constructicon have been started for different languages, including English (Perek & Patten 2019) and German (Ziem et al. 2019). They build on existing lexical resources such as FrameNet (Perek & Patten 2019) and/or manual in-depth analysis of selected constructions5 (Ziem et al. 2019). Automatic identification of constructions has only been attempted by a small number of exploratory studies, based on word n-grams (Shibuya & Jensen 2015), hybrid n-grams of words and POS tags (Forsberg et al. 2014), or a combination of dependency-based co-occurrence with distributional clustering (Martí et al. 2019). All three studies focus on extracting and ranking construction candidates for manual inspection, but do not discuss identifying criteria or generate additional quantitative evidence for human annotators. Gries (2003) carries out a small feasibility study on finding prototypical instances of a given construction, but does not address the issue of construction identification.This project explores how and to what extent quantitative data from large corpora can contribute to the task of delineating constructions, i.e. help researchers to assess the degree of “constructionhood” of a candidate construction (CxCand), develop systematic defining criteria for this assessment, and lay the groundwork for (semi-)automatic identification of constructions at scale. The project combines computational big data analysis of English and German corpora with constructicographic work (Lyngfelt et al. 2018), extending the collo-profile approach proposed by Herbst & Uhrig (2019: 177ff) for argument structure constructions. It addresses three central research questions: Q1: Does quantitative evidence from large corpora improve the manual identification of constructions and the development of defining criteria? Q2: What statistical measures are suitable as an operationalisation of such quantitative data, providing a basis for computing an index of “constructionhood” and for the automatic identification of constructions?Q3: Can context-sensitive neural word and phrase embeddings be used as a corpus-based approximation of construction meaning?
The project starts by extracting large databases of CxCand from English and German Web corpora of more than 10 billion words, based on pre-defined syntactic patterns such as verb argument structure. The extraction relies on an existing HPC infrastructure for parsing large corpora at FAU. Widely-used criteria for determining “constructionhood” such as productivity, compositionality / idiomaticity and schematicity / lexical specificity (Ziem et al. 2019: 69f) are operationalised in terms of corpus frequency, productivity of slots, statistical association between lexical elements, morpho-syntactic preferences, context entropy, etc. They are computed from the CxCand database using state-of-the-art measures from methodological research carried out at FAU, which provide the basis for answering Q2. Following Herbst & Uhrig (2019), the meaning aspect of a CxCand is initially approximated by the collo-profiles of its open slots. A thorough constructicographic analysis of different sets of CxCand sheds light on Q1 (whether constructions can clearly be identified) and Q2 (which quantitative measures are most useful for this purpose). These sets include well-studied examples of constructions from the literature (used for validation of the approach), sets based on a syntactic pattern (such as mono-transitive verb argument structure), and sets based on a lexical item (in particular various prepositions, in collaboration with project #9). The most challenging and open-ended aspect of the project explores the use of context-sensitive word and phrase embeddings (e.g. Devlin et al. 2019) to operationalise the semantics of a CxCand, following the distributional hypothesis (Harris 1954) and recent proposals for a distributional CxG (DisCxG: Rambelli et al. 2019). If successful, i.e., if there is a positive answer to Q3, not only the form of a construction but also its meaning can be studied based on corpus evidence.Research questions Q1 and Q2 directly address GRQ CON1 (How do we identify constructions? Can they be seen as discrete units?) and GRQ CON2 (To what extent is constructional knowledge determined by collo-profiles? How can we measure the lexical specificity vs. productivity of constructions slots?). An important part of the constructicographic analysis is to delineate between a CxCand and related constructions, such as a generalisation of the CxCand or an overlapping combination of two constructions. In this way, the project also addresses GRQ NET1 (How can computational methods help reveal the network character of constructional space?).The project will contribute a substantial number of entries to the RCnn, combining constructicographic descriptions with rich quantitative evidence. A suitable representation format for these entries will be developed in close collaboration with the PDR. The CxCand database constitutes a valuable resource for other projects working on English or German constructions; an extension to other languages is envisaged for the second phase of the RTG.

→Mehr Informationen
DFG-Projekt: Die Normalisierung rechtspopulistischer und neurechter Diskurse in Japan und Deutschland

(Drittmittelfinanzierte Gruppenförderung – Gesamtprojekt)

Laufzeit: 1. April 2022 - 31. März 2025
Mittelgeber: Deutsche Forschungsgemeinschaft (DFG)

Abstract

Der Lehrstuhl Japanologie mit dem Schwerpunkt Japan derModerne und Gegenwart ist Teil des durch die DFG geförderten Projekts „Die Normalisierungrechtspopulistischer und neurechter Diskurse in Japan und Deutschland“ ,das interdisziplinär in Kooperation mit dem Lehrstuhl für Korpus- undComputerlinguistik durch die Philosophische Fakultät der FAU durchgeführt wird. In diesem vergleichend angelegten Forschungsprojekt wird ausdiskursanalytischer Perspektive verschiedene Instanzen des politischenPopulismus als „schlanke Ideologie“ (Mudde/Kaltwasser) in ihrer jeweiligenideologischen Nähe zu neurechten Diskursen in Japan und Deutschland.Insbesondere analysiert werden die langfristigen Auswirkungen neurechterdiskursiver Strategien und rechtspopulistischer Politik auf die Alltagsspracheund das politische Diskursfeld mit den Methoden der Korpus- undComputerlinguistik sowie der korpusbasierten kritischen Diskursanalyse.

→Mehr Informationen
Tracking the infodemic: Conspiracy theories in the corona crisis

(Drittmittelfinanzierte Einzelförderung)

Laufzeit: 1. April 2021 - 30. September 2022
Mittelgeber: Volkswagen Stiftung

Abstract

Welche Kreise ziehen Verschwörungstheorien? Das Projekt untersucht diese Frage, indem wir mit Methoden der Korpuslinguistik den Gebrauch und die Verbreitung von Verschwörungstheorien anhand typischer Sprachmuster analysieren. Außerdem untersuchen wir die diskursiven Strategien, die Verschwörungstheorien mit rechtspopulistischen und extremistischen Diskursen gemeinsam haben. Ziel des Projekts ist es nicht nur, wichtige Einsichten in den Diskurs zur Corona-Pandemie zu gewinnen, sondern auch, die verwendeten Methoden weitgehend zu automatisieren, sodass sie eingesetzt werden können, um die Verbreitung anderer Verschwörungstheorien und Fehlinformationen zu untersuchen, etwa indem bestimmte Argumentationsmuster automatisch identifiziert werden.

→Mehr Informationen
Argumentrekonstruktion aus Politischen Debatten

(Drittmittelfinanzierte Einzelförderung)

Laufzeit: 1. Januar 2021 - 31. Dezember 2023
Mittelgeber: DFG / Schwerpunktprogramm (SPP)
URL: https://www.linguistik.phil.fau.de/projects/rant/

Abstract

Politische Debatten liegen heutzutage zu großen Teilen in maschinenlesbarer Form vor – in der formellen Öffentlichkeit von Parlamentsdebatten ebenso wie in der Halböffentlichkeit sozialer Medien. Dies eröffnet die Möglichkeit, sich mit automatischen Textanalysemethoden einen breiten Überblick über die vorgebrachten Argumente zu verschaffen. Das Projekt RANT/RAND entwickelt im Rahmen des SPP RATIO (Robust Argumentation Machines) zu diesem Zweck einen kombinierten Ansatz, in den Methoden aus Logik und Korpuslinguistik einfließen. Da aufgrund der riesigen Menge verfügbarer Daten davon ausgegangen werden kann, dass alle wichtigen Argumente auch bei relativ niedriger Sensitivität gefunden werden, setzen unsere Verfahren auf hohe Genauigkeit (auf Kosten der Sensitivität). Dazu erstellen wir eine Liste von Logikmustern, die gängigen Argumentationsschemata entsprechen (z.B. Argumentum ad verecundiam) und im Wesentlichen als mit Platzhaltern versehene Formeln in speziellen Modallogiken betrachtet werden können. Jedes Logikmuster ist mit mehreren sprachlichen Realisierungen verknüpft, die in korpuslinguistischen Studien erarbeitet und gleichzeitig in Form von Suchanfragen operationalisiert werden. Unser Ansatz verbindet somit die Entwicklung automatischer Methoden zur Argumentextraktion mit neuen Erkenntnissen über linguistische Aspekte insbesondere der umgangssprachlichen politischen Argumentation. Die aktuell laufende erste Phase des Projekts konzentriert sich auf die Entwicklung und Evaluation von Logikmustern und korpuslinguistischen Suchanfragen für einzelne Argumente anhand einer Fallstudie auf einem großen englischsprachigen Twitter-Korpus. In der zweiten Projektphase werden wir die Robustheit unseres Ansatzes testen, indem wir weitere extsorten mit einbeziehen und insbesondere auch längere kohärente Texte wie Zeitungsartikel und Parlamentsdebatten analysieren. Zudem arbeiten wir in der zweiten Phase mit deutschsprachigen Texten, die mit korpuslinguistischen Suchanfragen wesentlich schwieriger zu erfassen sind (u.a. aufgrund diskontinuierlicher Konstituenten und eines deutlich kleineren Angebots qualitativ hochwertiger NLP-Werkzeuge). Ein weiterer entscheidender Schritt ist der Einsatz ähnlichkeitsbasierter Methoden, um aus den extrahierten Argumenten komplexe Schlussfolgerungen ziehen zu können. Dazu werden Platzhalter in den extrahierten Formeln mit speziell auf unsere Anforderungen zugeschnittenen Embedding-Vektoren ausgefüllt. Ferner werden wir unseren Ansatz auf die Extraktion von Argumentationsstrukturen, d.h. explizite und implizite Verweise zwischen Argumenten, ausdehnen. Ergänzend dazu werden wir die logische Struktur von Argumentation über Planung untersuchen und Querverbindungen zwischen Argumentation und zwischenmenschlichen Beziehungen herstellen (z.B. in Ad-hominem-Argumenten).

→Mehr Informationen
Automatische Anonymisierung und Pseudonymisierung von Gerichtsurteilen

(Drittmittelfinanzierte Einzelförderung)

Laufzeit: 1. April 2020 - 31. März 2022
Mittelgeber: Bayerisches Staatsministerium der Justiz (StMJ)

→Mehr Informationen
Korpus- und Computerlinguistik interkulturell

(Drittmittelfinanzierte Einzelförderung)

Laufzeit: 1. März 2020 - 31. August 2021
Mittelgeber: Bayerische Forschungsallianz (BayFOR)

→Mehr Informationen
Palliativmedizin als Diskurs - interdisziplinär (PalaDin)

(Drittmittelfinanzierte Einzelförderung)

Laufzeit: 15. Mai 2019 - 14. Mai 2020
Mittelgeber: Stiftungen
URL: https://www.palliativmedizin.uk-erlangen.de/forschung/forschungsschwerpunkte/

→Mehr Informationen
Rekonstruktion von Argumenten aus Noisy Text (SPP 1999: RATIO)

(Drittmittelfinanzierte Einzelförderung)

Laufzeit: 1. Januar 2018 - 31. Dezember 2020
Mittelgeber: Deutsche Forschungsgemeinschaft (DFG)

Abstract

Soziale Medien spielen in der gesellschaftlichen Meinungsbildung eine wachsende Rolle. Gegenstand von RANT ist die Entwicklung von Methoden und Formalismen zur Extraktion, Repräsentation und Verarbeitung von Argumenten aus Texten geringer linguistischer Qualität, wie sie eben in Diskussionen auf sozialen Medien anzutreffen sind, anhand einer laufenden Fallstudie an einem großen Korpus von vor dem Referendum verbreiteten Twitter-Botschaften zum Thema Brexit. Wir werden eine korpuslinguistische Studie zur Identifikation wiederkehrender sprachlicher Argumentationsschemata durchführen und anhand dieser Schemata im Sinne eines High-Precision-Low-Recall-Ansatzes entsprechende Korpusanfragen zur Extraktion von Argumenten entwerfen. In der Tat erwarten wir, dass sich Argumentationsschemata unmittelbar mit logischen Schemata in einem dedizierten Formalismus in Verbindung bringen lassen und somit einzelne Argumente direkt als logische Formeln geparst werden können. Der zur Argumentrepräsentation verwendete Formalismus wird ein breites Spektrum an Modalitäten beinhalten, die in realen Texten auftretende sprachlich-semantische Phänomene wie Unsicherheit, Wirkung, Präferenz, Sentiment, Vagheit und Default-Implikation widerspiegeln. Wir werden einen solchen Formalismus als Familie von Instanzlogiken in der koalgebraischen Logik darstellen, die als generisches logisches Rahmenwerk vereinheitlichte semantische, deduktive und algorithmische Methoden für Modalitäten jenseits der üblichen relationalen Semantik zur Verfügung stellt; insbesondere werden wir Deduktionswerkzeuge für Argumentationslogiken auf bestehende generische koalgebraische Werkzeuge aufbauen. Die so entstehende logische Sprache zur Repräsentation einzelner Argumente wird ergänzt durch ein flexibles Rahmenwerk zur Repräsentation von Beziehungen zwischen Argumenten. Hierzu gehören sowohl in der Argumentationstheorie verbreitet betrachtete Relationen wie die Angriffs- und Unterstützungsrelationen sowie aus den Metadaten des Korpus gewonnene Beziehungen wie Zitation, Hashtags oder direkte Ansprache (per Erwähnung von Benutzernamen) als auch solche Beziehungen, die sich erst durch logische Schlussfolgerung aus dem Inhalt der Argumente ergeben. Insbesondere letztere Beziehungen stellen sich semantisch oft nicht als Relationen im engeren Sinne dar, sondern involvieren z.B. kontinuierliche Wahrheitswerte, Präferenzordnungen oder Wahrscheinlichkeiten und profitieren insofern von einer einheitlichen koalgebraischen Modellierung, die auch die semantische Grundlage der koalgebraischen bildet. Wir werden dementsprechend geeignete Verallgemeinerungen der für Dung's Argumentation Frameworks definierten Extensionssemantiken entwickeln und somit letztlich Begriffe wie „kohärenter Standpunkt“ oder „verbreitete Sichtweise“ formal einfangen; in Verbindung mit entsprechenden algorithmischen Methoden wird dies die automatisierte Extraktion umfassender argumentativer Positionen aus dem Korpus erlauben.

→Mehr Informationen
Komplexität literatischer Werke aus stilometrischer Sicht im Digital Humanities-Zentrum KALLIMACHOS

(Drittmittelfinanzierte Gruppenförderung – Teilprojekt)

Titel des Gesamtprojektes: KALLIMACHOS – Zentrum für digitale Edition und quantitative Analyse an der Universität Würzburg
Laufzeit: 1. Oktober 2017 - 30. September 2019
Mittelgeber: BMBF / Verbundprojekt

Abstract

Im Rahmen dieses Teilprojekts entwickelt der Lehrstuhl für Korpus- und Computerlinguistik robuste Maße für lexikalische Komplexität, erweitert den Komplexitätsbegriff über die gängige vocabulary richness hinaus und implementiert die Ergebnisse in einerfrei verfügbaren stilometrischen Toolbox.

→Mehr Informationen
Exploring the “Fukushima Effect”: Meinungsnetze und politische Willensbildung in der transnationalen algorithmischen Öffentlichkeit

(FAU Funds)

Laufzeit: 1. Januar 2017 - 31. Dezember 2019
URL: https://www.linguistik.phil.fau.de/projects/efe/

Abstract

Die Digitalisierung der Gesellschaft und der Mediensysteme hat immense Auswirkungen auf (politische) Meinungsbildung und Diskurse. Dieses Projekt widmet sich der Untersuchung eines komplexen Phänomens, das im Zeitalter globalisierter Massenmedien und einer nationale Grenzen überschreitenden Konnektivität in den Sozialen Medien entstanden ist und von uns als transnationale algorithmische öffentlichkeit bezeichnet wird. Eine interdisziplinäre Kombination aus computerlinguistischen Verfahren, Netzwerkvisualisierung, interkultureller Hermeneutik und kommunikationswissenschaftlicher Inhaltsanalyse ermöglicht es uns, die diesem Phänomen zugrundeliegenden Prozesse zu analysieren und abzubilden. Thematisch befasst sich das Projekt mit der politisch aktuellen Diskussion zur Atomenergie und Energiewende nach Fukushima in Deutschland und Japan.

→Mehr Informationen
Effiziente Simulationsexperimente zur Parameteroptimierung speicherintensiver computerlinguistischer Lernverfahren

(Drittmittelfinanzierte Einzelförderung)

Laufzeit: 1. Oktober 2016 - 30. September 2017
Mittelgeber: Bayerisches Staatsministerium für Bildung und Kultus, Wissenschaft und Kunst (ab 10/2013)

Abstract

Ziel des Projekts ist es, speicherintensive maschinelle Lernverfahren für den Einsatz auf HPC-Clustern zu optimieren, um Simulationsexperimente zur systematischen Parameteroptimierung der Verfahren durchführen zu können. Als prototypischer Anwendungsfall dienen Matrixfaktorisierungen und Deep Learning-Modelle in der distributionellen Semantik.

→Mehr Informationen
Reisekostenbeihilfe für Konferenzreise nach Portozoz

(Drittmittelfinanzierte Einzelförderung)

Laufzeit: 1. August 2016 - 31. Oktober 2016
Mittelgeber: Stiftungen

→Mehr Informationen
Computer-basierte Messung von Motiven

(Projekt aus Eigenmitteln)

Laufzeit: seit 1. Januar 2016

Abstract

Die standardisierte Messung impliziter motivationaler Bedürfnisse anhand der Inhaltskodierung von Fantasiegeschichten ist ein zeit- und personalintensives Verfahren. Der Lehrstuhl arbeitet in Kollaboration mit Kollegen aus der Corpuslinguistik und der Informatik daher an der Entwicklung computer-basierter Verfahren zur automatischen Messung motivationaler Bedürfnisse. Dabei kommen psychoinguistische Verfahren wie Linguistic Inquiry and Word Count ebenso zu Einsatz wie Mustererkennungsverfahren.

→Mehr Informationen
Mehrsprachigkeit und Migration

(Projekt aus Eigenmitteln)

Laufzeit: seit 1. Januar 2016

→Mehr Informationen
Englisches Konstruktikon

(Projekt aus Eigenmitteln)

Laufzeit: seit 1. Januar 2016

→Mehr Informationen
Korpuslinguistische Methoden und statistische Auswertungen im Digital Humanities-Zentrum KALLIMACHOS

(Drittmittelfinanzierte Gruppenförderung – Teilprojekt)

Titel des Gesamtprojektes: KALLIMACHOS – Zentrum für digitale Edition und quantitative Analyse an der Universität Würzburg
Laufzeit: 1. Oktober 2014 - 30. September 2017
Mittelgeber: BMBF / Verbundprojekt
URL: http://www.kallimachos.de/

Abstract

In diesem Teilprojekt soll das Verständnis für die mathematischen Eigenschaften der literarischen Autorschaftsattribution mit stilometrischen Abstandsmaßen verbessert werden. Außerdem ist die Trennung von Autor-, Gattungs- und Epochensignal in stilometrischen Analysen von großem Interesse, da dies wiederum der Zuverlässigkeit einer automatischen Genreklassifikationen nutzen könnte. Darüber hinaus sollen zuverlässige statistische Methoden zur Signifikanzüberprüfung der festgestellten Entwicklungen ausgearbeitet, implementiert und erprobt werden.

→Mehr Informationen
Entwicklung einer Textclustering-Software für die Auswertung von Meinungsumfragen mit RogTCS

(Drittmittelfinanzierte Einzelförderung)

Laufzeit: seit 3. Juni 2013
Mittelgeber: Industrie
URL: https://www.rogator.de/software/textanalysesoftware/

Abstract

Gegenstand des Projekts ist die Erprobung verschiedener computerlinguistischer Verfahren zur halbautomatischen Auswertung offener Fragen in Meinungsumfragen. Im Mittelpunkt stehen dabei die Identifikation wichtiger Themen (topic analysis), die Erkennung positiver, negativer und neutraler Bewertungen (polarity detection) sowie die Visualisierung der automatischen Auswertungen. Die eingesetzten Verfahren sind weitgehend sprachunabhängig und werden im Rahmen des Projekts auf deutsche und englische Textdaten angewendet.

→Mehr Informationen

Vorlesung

Vorlesung Grundlagen der Computerlinguistik 3

Hauptseminar

Hauptseminar

Oberseminar

Oberseminar Computerlinguistik

Proseminar

Proseminar Computerlinguistik

Übung

Übung Grundlagen der Computerlinguistik 3

FAU-interne Gremienmitgliedschaften / Funktionen

Fakultätsrat (Mitglied)
1. Oktober 2015 - 30. September 2019, Philosophische Fakultät und Fachbereich Theologie
Interdisziplinary Center - member
seit 18. November 2014, Interdisziplinäres Zentrum für Lexikografie, Valenz- und Kollokationsforschung
Interdisziplinary Center - member
seit 24. April 2014, Interdisziplinäres Zentrum für Digitale Geistes- und Sozialwissenschaften

Organisation von Tagungen / Konferenzen

Text Mining and Generation (TMG)
19. September 2022 - 19. September 2022, URL: https://recap.uni-trier.de/2022-tmg-workshop/
CogALex-V Shared Task on the Corpus-Based Identification of Semantic Relations
12. Dezember 2016 - 12. Dezember 2016, Osaka
EmpiriST Shared Task on Automatic Linguistic Annotation of CMC & Web Corpora
12. August 2016 - 12. August 2016, Berlin

Gutachtertätigkeiten für wissenschaftliche Zeitschriften

seit 8. Juli 2015
International Journal of Corpus Linguistics
seit 1. Januar 2011

Sonstige FAU-externe Aktivitäten

Mitglied im Gremium NA 105-00-06 AA „Sprachressourcen“ des Normenausschusses Terminologie (NAT), DIN e.V.
seit 21. Mai 2014
Mitglied im Challenge and Innovation Panel des ESRC Centre for Corpus Approaches to Social Science (CASS)
1. April 2013 - 31. März 2018, Lancaster University
Vorsitzender der Special Interest Group on the Web as Corpus (SIGWAC) in der Association for Computational Linguistics (ACL)
1. August 2012 - 31. Juli 2015

Stephanie Evert: Mitgliedschaft Bayerische Akademie der Wissenschaften – 2025

Prof. Dr. Stephanie Evert

Prof. Dr. Stephanie Evert

Sprechzeiten

Vita

Lehr- und Forschungsschwerpunkte

Publikationen

Bücher

Herausgegebene Bände

Beiträge in Fachzeitschriften

Beiträge in Sammelwerken

Beiträge bei Tagungen

Abschlussarbeiten

Projekte

Prompt Higher Learning – Mit KI-gestützten Writing Tools (Hochschul-)Bildung verbessern?!

Digitaler Registerassistent

Reading concordances in the 21st century (RC21)

Automatische Anonymisierung von Gerichtsentscheidungen für E-Justice und Legal-Tech

GRK 2839: Die Konstruktionsgrammatische Galaxis

Die Normalisierung rechtspopulistischer und neurechter Diskurse in Japan und Deutschland

Multimodal Constructional Space

Corpus Evidence for Delineating Constructions

DFG-Projekt: Die Normalisierung rechtspopulistischer und neurechter Diskurse in Japan und Deutschland

Tracking the infodemic: Conspiracy theories in the corona crisis

Argumentrekonstruktion aus Politischen Debatten

Automatische Anonymisierung und Pseudonymisierung von Gerichtsurteilen

Korpus- und Computerlinguistik interkulturell

Palliativmedizin als Diskurs - interdisziplinär (PalaDin)

Rekonstruktion von Argumenten aus Noisy Text (SPP 1999: RATIO)

Komplexität literatischer Werke aus stilometrischer Sicht im Digital Humanities-Zentrum KALLIMACHOS

Exploring the “Fukushima Effect”: Meinungsnetze und politische Willensbildung in der transnationalen algorithmischen Öffentlichkeit

Effiziente Simulationsexperimente zur Parameteroptimierung speicherintensiver computerlinguistischer Lernverfahren

Reisekostenbeihilfe für Konferenzreise nach Portozoz

Computer-basierte Messung von Motiven

Mehrsprachigkeit und Migration

Englisches Konstruktikon

Korpuslinguistische Methoden und statistische Auswertungen im Digital Humanities-Zentrum KALLIMACHOS

Entwicklung einer Textclustering-Software für die Auswertung von Meinungsumfragen mit RogTCS

Lehrveranstaltungen

Aktivitäten

FAU-interne Gremienmitgliedschaften / Funktionen

Organisation von Tagungen / Konferenzen

Gutachtertätigkeiten für wissenschaftliche Zeitschriften

Sonstige FAU-externe Aktivitäten

Auszeichnungen