Piotr Przybyła

I'm a tenure-track professor at the Universitat Pompeu Fabra in Barcelona, Spain and a researcher (as Ramón y Cajal fellow) in the TALN (Natural Language Processing) Research Group there. I am also affiliated with the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS) in Warsaw, Poland. Before, I obtained my PhD degree in Computer Science from ICS PAS and worked as a research fellow in the National Centre for Text Mining (NaCTeM) at the University of Manchester.

My most recent research project, ERINIA (Evaluating the Robustness of Non-Credible Text Identification by Anticipating Adversarial Actions), was funded as a Marie Skłodowska-Curie Postdoctoral Fellowship by the European Union.

In case you're wondering: my surname is pronounced /pʂɨbɨwa/, as in: Powerful sheikh is bringing in wonderful art.

News

(05.2026) I presented a corpus for training and evaluating machine-generated text detection in Polish at LREC 2026.
(02.2026) The results of the ŚMIGIEL shared task (on detecting machine-generated text in Polish) I organised are published in the ACL Anthology.
(01.2026) I will be presenting my recent work at the seminars of Digital Media SIG of Alan Turing Institute and COLT group of UPF.
(01.2026) I started my Ramón y Cajal fellowship in the TALN group.
(11.2025) At this year's EMNLP in Suzhou, China, I presented three articles: on attacks on misinformation detection, morphology-aware tokenisation and readability-adjusted text simplification.
(08.2025) I started my tenure-track professorship at the UPF.
(08.2025) Our article on "Attacking Misinformation Detection Using Adversarial Examples Generated by Language Models" will be presented at EMNLP 2025.
(07.2025) Our article on building a corpus and classifiers of anthropomorphic language in NLP reporting was presented at ACL 2025.
(04.2025) At this year's RANLP conference, I will co-organise OMMM 2025: the first Interdisciplinary Workshop on Observations of Misunderstood, Misguided and Malicious Use of Language Models.
(03.2025) I am co-organising a shared task on Spotting Machine-Generated Text from Language Models for Polish (ŚMIGIEL) at PolEval 2025.
(12.2024) PLOS ONE published our study on the (dubious) consciousness of LLMs and anthropomorphic language in AI reporting.
(11.2024) An article describing our work verifying the robustness of misinformation detection agains adversarial attacks, performed in the ERINIA project, was published in the Natural Language Processing journal.
(10.2024) We participated in the WMT shared task on translating low-resource Iberian languages, including Aragonese, Aranese and Asturian: see our results.
(09.2024) I was in Grenoble for the CLEF 2024 conference, summarising the results of the InCrediblAE shared task and chairing the session including presentation of participants of this task and other within CheckThat! 2024.
(08.2024) At this year's ACL, I'm presenting our work on the adaptive search for adversarial examples (at the WASSA workshop) and processing affiliations in scholarly literature (at the SDProc workshop).
(08.2024) On the 20th of August, I'll be presenting our work in the ERINIA project at a seminar at the Chulalongkorn University in Bangkok.
(08.2024) The results of our InCrediblAE Task 6 at CheckThat! 2024 lab have been published in the overview article.
(05.2024) Our work on a Polish QA corpus has been presented at LREC-COLING 2024 in Torino, Italy.
(03.2024) An article describing the CheckThat! 2024 lab, including our Task 6, has been published in the proceedings of ECIR 2024.
(03.2024) I am presenting BODEGA at the Natural Language Processing Seminar at Polish Academy of Sciences. The recording (in Polish) will be available through website.
(03.2024) My multi-level segmenter, LAMBO, has been updated to better deal with foreign characters and work with more languages: it supports 67 now!
(02.2024) A publication introducing a dataset for Polish QA has been accepted for LREC-COLING 2024.
(01.2024) I am coordinating a shared task on Robustness of Credibility Assessment with Adversarial Examples (InCrediblAE): Task 6 at the CheckThat! 2024 Evaluation Lab.
(01.2024) I have been awarded a computing grant of 10,000 hours on the Athena supercomputer at AGH University of Kraków to accelerate work in the ERINIA project.
(10.2023) An article describing our (winning!) solution to AuTexTification shared task has been published in IberLEF 2023 proceedings.
(10.2023) An article describing a model for propaganda detection created within the HOMADOS project has been published in SEPLN 2023 proceedings.

Past projects

Publications

Detecting machine-generated text

J. Strebeyko, A. Wróblewska, P. Przybyła, “Śmigiel Dataset: Laying Foundations for Investigating Machine-Generated Text Detection in Polish,” in Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026), pp. 10556-10568, Palma, Mallorca, Spain, 2026. [bib][paper][corpus][code]
P. Przybyła, J. Strebeyko, A. Wróblewska, “PolEval 2025 Task 1 Śmigiel: Spotting Machine-Generated Text from LLMs for Polish,” in Proceedings of the PolEval 2025 Workshop, Warsaw, Poland, 2025. [bib][paper]
P. Przybyła, N. Duran-Silva, S. Egea-Gómez, “I've Seen Things You Machines Wouldn't Believe: Measuring Content Predictability to Identify Automatically-Generated Text,” in Proceedings of the 5th Workshop on Iberian Languages Evaluation Forum (IberLEF 2023), Jaén, Spain, 2023. [bib][paper][code]
P. Przybyła, “Detecting Bot Accounts on Twitter by Measuring Message Predictability,” in Working Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum, Lugano, Switzerland, 2019. [bib][paper][code]

Credibility and misinformation

P. Przybyła, E. McGill, H. Saggion, “Attacking Misinformation Detection Using Adversarial Examples Generated by Language Models,” in Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), Suzhou, China, 2025. [bib][paper][code]
P. Przybyła, A. Shvets, H. Saggion, “Verifying the Robustness of Automatic Credibility Assessment,” Natural Language Processing, vol. 31, issue 5, pp. 1134-1162, 2024. [bib][paper][code]
A. Barrón-Cedeño, F. Alam, J. M. Struß, P. Nakov, T. Chakraborty, T. Elsayed, P. Przybyła, T. Caselli, G. Da San Martino, F. Haouari, M. Hasanain, C. Li, J. Piskorski, F. Ruggeri, X. Song, R. Suwaileh, “Overview of the CLEF-2024 CheckThat! Lab: Check-Worthiness, Subjectivity, Persuasion, Roles, Authorities, and Adversarial Robustness,” in Proceedings of the Fifteenth International Conference of the CLEF Association (CLEF 2024), Grenoble, France, 2024. [bib][paper][preprint][event]
P. Przybyła, E. McGill, H. Saggion, “Know Thine Enemy: Adaptive Attacks on Misinformation Detection Using Reinforcement Learning,” in Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, Bangkok, Thailand, 2024. [bib][paper][code]
P. Przybyła, B. Wu, A. Shvets, Y. Mu, K. C. Sheang, X. Song, H. Saggion, “Overview of the CLEF-2024 CheckThat! Lab Task 6 on Robustness of Credibility Assessment with Adversarial Examples (InCrediblAE),” in Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum, Grenoble, France, 2024. [bib][paper][event][code]
A. Barrón-Cedeño, F. Alam, T. Chakraborty, T. Elsayed, P. Nakov, P. Przybyła, J. M. Struß, F. Haouari, M. Hasanain, F. Ruggeri, X. Song, R. Suwaileh, “The CLEF-2024 CheckThat! Lab: Check-Worthiness, Subjectivity, Persuasion, Roles, Authorities, and Adversarial Robustness,” in Proceedings of the 46th European Conference on Information Retrieval (ECIR 2024), Glasgow, UK, 2024. [bib][paper][preprint][event]
P. Przybyła, K. Kaczyński, “Where Does It End? Long Named Entity Recognition for Propaganda Detection and Beyond,” in Proceedings of the Workshop on NLP applied to Misinformation co-located with 39th International Conference of the Spanish Society for Natural Language Processing (SEPLN 2023), Jaén, Spain, 2023. [bib][paper][code]
P. Przybyła, H. Saggion, “ERINIA: Evaluating the Robustness of Non-Credible Text Identification by Anticipating Adversarial Actions,” in Proceedings of the Workshop on NLP applied to Misinformation co-located with 39th International Conference of the Spanish Society for Natural Language Processing (SEPLN 2023), Jaén, Spain, 2023. [bib][paper]
P. Przybyła, P. Borkowski, K. Kaczyński, “Countering Disinformation by Finding Reliable Sources: a Citation-Based Approach,” in Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 2022. [bib][paper][preprint][data][corpus][code]
P. Przybyła, A. J. Soto, “When classification accuracy is not enough: Explaining news credibility assessment,” Information Processing & Management, vol. 58, issue 5, 2021.[bib][paper][data,code]
K. Kaczyński, P. Przybyła, “HOMADOS at SemEval-2021 Task 6: Multi-Task Learning for Propaganda Detection,” in Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), Bangkok, Thailand, 2021. [bib][paper]
P. Przybyła, “Capturing the Style of Fake News,” in Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), New York, USA, 2020. [bib][paper][corpus][code]
J. Gąsior and P. Przybyła, “The IPIPAN Team Participation in the Check-Worthiness Task of the CLEF2019 CheckThat! Lab,” in Working Notes of CLEF 2019 - Conference and Labs of the Evaluation Forum, Lugano, Switzerland, 2019.[bib][paper]

NLP meta-research

M. Shardlow, A. Williams, C. Roadhouse, F. Ventirozos, P. Przybyła, “Learn, Achieve, Predict, Propose, Forget, Suffer: Analysing and Classifying Anthropomorphisms of LLMs,” in Proceedings of Interdisciplinary Workshop on Observations of Misunderstood, Misguided and Malicious Use of Language Models, Varna, Bulgaria, 2025. [bib][paper]
M. Shardlow, A. Williams, C. Roadhouse, F. Ventirozos, P. Przybyła, “Exploring Supervised Approaches to the Detection of Anthropomorphic Language in the Reporting of NLP Venues,” in Findings of the Association for Computational Linguistics: ACL 2025, Vienna, Austria, 2025. [bib][paper][corpus]
M. Shardlow, P. Przybyła, “Deanthropomorphising NLP: Can a language model be conscious?,” PLOS ONE, vol. 19, issue 12, 2024. [bib][paper]
P. Przybyła, M. Shardlow, “Using NLP to quantify the environmental cost and diversity benefits of in-person NLP conferences,” in Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, 2022. [bib][paper][data][code]

NLP applications for biomedical and scholarly text

N. Duran-Silva, P. Accuosto, P. Przybyła, H. Saggion, “AffilGood: Building reliable institution name disambiguation tools to improve scientific literature analysis,” in Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024), Bangkok, Thailand, 2024. [bib][paper][code+data]
A. J. Brockmeier, M. Ju, P. Przybyła, S. Ananiadou, “Improving reference prioritisation with PICO recognition,” BMC Medical Informatics and Decision Making, vol. 19, p. 256, 2019. [bib][paper]
P. Przybyła, A. J. Brockmeier, S. Ananiadou, “Quantifying risk factors in medical reports with a context-aware linear model,” Journal of the American Medical Informatics Association, vol. 26, issue 6, pp. 537-546, 2019. [bib][paper]
A. Bannach-Brown, P. Przybyła, J. Thomas, A. S. C. Rice, S. Ananiadou, J. Liao, M. R. Macleod, “Machine learning algorithms for systematic review: reducing workload in a preclinical review of animal studies and reducing human screening error,” Systematic Reviews, vol. 8, issue 1, pp. 23, 2019. [bib][paper][data][software]
A. J. Soto, P. Przybyła, S. Ananiadou, “Thalia: Semantic search engine for biomedical abstracts,” Bioinformatics, vol. 35, issue 10, pp. 1799-1801, 2018.[bib][paper][web service]
P. Przybyła, A. J. Brockmeier, G. Kontonatsios, M. Le Pogam, J. McNaught, E. von Elm, K. Nolan, S. Ananiadou, “Prioritising references for systematic reviews with RobotAnalyst: A user study,” Research Synthesis Methods, vol. 9, no. 3, pp. 470-488, 2018.[bib][paper][web service]
G. Kontonatsios, A. J. Brockmeier, P. Przybyła, J. McNaught, T. Mu, J. Y. Goulermas, S. Ananiadou, “A semi-supervised approach using label propagation to support citation screening,” Journal of Biomedical Informatics, vol. 72, 2017.[bib][paper]
P. Przybyła, A. J. Soto and S. Ananiadou, “Identifying Personalised Treatments and Clinical Trials for Precision Medicine using Semantic Search with Thalia,” in Proceedings of the Twenty-Fifth Text REtrieval Conference (TREC 2017), Gaithersburg, Maryland, USA, 2017.[bib][paper]
P. Przybyła, M. Shardlow, S. Aubin, R. Bossy, R. Eckart de Castilho, S. Piperidis, J. McNaught, S. Ananiadou, “Text Mining Resources for the Life Sciences,” Database: The Journal of Biological Databases and Curation, vol. 2016, 2016.[bib][paper]

NLP for Polish

P. Rybak, P. Przybyła, M. Ogrodniczuk, “PolQA: Polish Question Answering Dataset,” in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italy, 2024. [bib][paper][corpus]
Ł. Kobyliński, M. Ogrodniczuk, P. Rybak, P. Przybyła, P. Pęzik, A. Mikołajczyk, W. Janowski, M. Marcińczuk, A. Smywiński-Pohl, “PolEval 2022/23 Challenge Tasks and Results,” in Proceedings of the 18th Conference on Computer Science and Intelligence Systems (FedCSIS 2023), Warsaw, Poland, 2023. [bib][paper]
M. Ogrodniczuk, P. Przybyła, “PolEval 2021 Task 4: Question Answering Challenge,” in Proceedings of the PolEval 2021 Workshop, Online, 2021. [bib][paper][data]
P. Przybyła, “How big is big enough? Unsupervised word sense disambiguation using a very large corpus,” Manuscript arXiv:1710.07960 [cs.CL], 2017.[bib][paper]
P. Przybyła, “Boosting Question Answering by Deep Entity Recognition,” Manuscript arXiv:1605.08675 [cs.CL], 2016.[bib][paper][data][corpus]
P. Przybyła, “Odpowiadanie na pytania w języku polskim z użyciem głębokiego rozpoznawania nazw,” (Question Answering in Polish using Deep Entity Recognition), PhD thesis in Institute of Computer Science, Polish Academy of Sciences in Warsaw, Poland, 2015.[bib][paper][data][corpus]
P. Przybyła, “Gathering Knowledge for Question Answering Beyond Named Entities,” in Proceedings of the 20th International Conference on Applications of Natural Language to Information Systems (NLDB 2015), Passau, Germany, 2015.[bib][paper][data][corpus]
P. Przybyła and P. Teisseyre, “Analysing Utterances in Polish Parliament to Predict Speaker’s Background,” Journal of Quantitative Linguistics, vol. 21, no. 4, pp. 350–376, 2014.[bib][paper]
P. Przybyła, “Question Analysis for Polish Question Answering,” in 51st Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop, Sofia, Bulgaria, 2013.[bib][paper]
P. Przybyła, “Question Classification for Polish Question Answering,” in Proceedings of the 20th International Conference on Language Processing and Intelligent Information Systems (LP&IIS 2013), Warsaw, Poland, 2013.[bib][paper]
P. Przybyła, “Issues of Polish Question Answering,” in Proceedings of the first conference “Information Technologies: Research and their Interdisciplinary Applications” (ITRIA 2012), Warsaw, Poland, 2012.[bib][paper]

Text simplification

P. Przybyła, “STARLING at TSAR 2025 Shared Task: Leveraging Alternative Generations for Readability Level Adjustment in Text Simplification,” in Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025), Suzhou, China, 2025. [bib][paper]
M. Shardlow, P. Przybyła, “Simplification by Lexical Deletion,” in Proceedings of the Second Workshop on Text Simplification, Accessibility and Readability (TSAR 2023), Varna, Bulgaria, 2023. [bib][paper][code]
L. Vásquez-Rodríguez, M. Shardlow, P. Przybyła, Sophia Ananiadou, “Document-level Text Simplification with Coherence Evaluation,” in Proceedings of the Second Workshop on Text Simplification, Accessibility and Readability (TSAR 2023), Varna, Bulgaria, 2023. [bib][paper][code]
L. Vásquez-Rodríguez, M. Shardlow, P. Przybyła, Sophia Ananiadou, “The Role of Text Simplification Operations in Evaluation,” in Proceedings of the First Workshop on Current Trends in Text Simplification (CTTS 2021), Online, 2021. [bib][paper][code]
L. Vásquez-Rodríguez, M. Shardlow, P. Przybyła, Sophia Ananiadou, “Investigating Text Simplification Evaluation,” in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Bangkok, Thailand, 2021. [bib][paper][code]
P. Przybyła, M. Shardlow, “Multi-Word Lexical Simplification,” in Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020), Barcelona, Spain, 2020. [bib][paper][data][model][code]

Other NLP

A. Táboas García, P. Przybyła, L. Wanner, “Exploring morphology-aware tokenization: A case study on Spanish language modeling,” in Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), Suzhou, China, 2025. [bib][paper][code]
I. Kuzmin, P. Przybyła, E. McGill, and H. Saggion, “TRIBBLE - TRanslating IBerian languages Based on Limited E-resources,” in Proceedings of the Ninth Conference on Machine Translation, Miami, USA, 2024.[bib][paper][code]
P. Przybyła, N. T. H. Nguyen, M. Shardlow, G. Kontonatsios, and S. Ananiadou, “NaCTeM at SemEval-2016 Task 1: Inferring sentence-level semantic similarity from an ensemble of complementary lexical and sentence-level features,” in Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016), San Diego, USA, 2016.[bib][paper]
P. Przybyła and P. Teisseyre, “What do your look-alikes say about you? Exploiting strong and weak similarities for author profiling - Notebook for PAN at CLEF 2015,” in CLEF 2015 Labs and Workshops, Notebook Papers, Toulouse, France, 2015.[bib][paper]

Computations in physics

M. Maćkowiak-Pawłowska, P. Przybyła, “Generalisation of the identity method for determination of high-order moments of multiplicity distributions with a software implementation,” European Physical Journal C, vol. 78, issue 5, 2018.[bib][paper][software]
P. Przybyła, “A pattern recognition method for lattice distortion measurement from HRTEM images,” Journal of Microscopy, vol. 245, no. 2, pp. 200–209, 2011.[bib][paper]

Piotr Przybyła

News

Past projects

HOMADOS

MOIRA

OpenMinTeD

Big Mechanism

Mining4EBPH

SLiM