PoliPrompt | Menglin Liu

Resources

Explore the code, documentation, and tutorials for the PoliPrompt package.

Python Package

Install PoliPrompt directly from PyPI to integrate LLM-based text classification into your research workflow with automatic prompt optimization and dynamic exemplar selection.

View on PyPI

GitHub Repository

Access the full source code, contribute to development, and explore the implementation of the three-stage in-context learning approach and MMR-based exemplar retrieval.

View on GitHub

Tutorial & Documentation

Follow step-by-step guides for setting up PoliPrompt, running classification tasks, and customizing prompts for your specific political text analysis needs.

Read the Docs

About the Research

PoliPrompt: A High-Performance Cost-Effective LLM-Based Text Classification Framework for Political Science

with Ge Shi (website)

Abstract

Recent advancements in large language models (LLMs) have opened new possibilities for text classification in political science, but their effectiveness and accuracy depends on prompt quality and exemplar selection in few-shot learning. To address these challenges, we introduce a three-stage in-context learning approach that improves classification accuracy through automatic prompt optimization and dynamic exemplar selection. Our method automates the generation of optimized, task-specific prompts by analyzing a pool of diverse exemplars, summarizing labeling rules, and refining prompts based on the LLM's interpretation. We also use the Maximal Marginal Relevance (MMR) algorithm to dynamically select the most relevant exemplars during inference, balancing relevance to the query text with diversity among the selected examples. Additionally, we incorporate a consensus mechanism that refines outputs from two weaker LLMs using a more advanced model to improve accuracy, reliability, and speed while reducing computational costs. We extensively tested our method across diverse classification tasks, including categorizing lengthy BBC news reports by topic, analyzing public opinion on Brett Kavanaugh's Supreme Court confirmation using Twitter data, and assessing the tone of campaign ads from the 2018 election. The experimental results show how our approach effectively overcomes key limitations of using LLMs for text classification. A free Python software package implementing this method is available on GitHub.