CV | Menglin Liu

Professional Summary

Applied research engineer and computational social scientist with Ph.D.-level training and 5+ years of experience building end-to-end NLP and survey analytics systems. Specialized in LLM-based text classification, interpretable modeling, and decision-oriented analytics for non-technical stakeholders. Strong track record translating ambiguous research and policy questions into scalable, production-ready pipelines and user-facing analytical tools.

Experience

Assistant Professor of Computational Social Science; Assistant Professor of AI (by courtesy) 2025 – Present

CUHK-Shenzhen

Led applied research and system development at the intersection of NLP, survey analytics, and social data.
Designed and deployed LLM-based pipelines for text and survey analysis used in active research and product prototyping.
Supervised applied research translating real-world questions into scalable analytical systems.

Founder & Lead Research Scientist 2025 – Present

SurveyFluency

Founded and led development of an AI-driven survey intelligence platform for rapid, interpretable analysis of complex survey data.
Designed system architecture supporting SPSS and longitudinal survey ingestion, demographic crosstabs, toplines, and visual analytics.
Built natural-language interfaces for structured survey exploration, enabling users to identify subgroup differences and temporal patterns without code.
Applied the platform to large-scale public datasets (e.g., General Social Survey cumulative file) to support early-stage exploratory analysis.
Focused on interpretability, transparency, and researcher-facing workflows for applied social science and policy analysis.

Data & Policy Analyst June – Sept. 2023

Committee of 100 (Remote)

Created interactive geospatial dashboards visualizing property restriction laws and policy changes across U.S. states and cities.
Built an end-to-end data pipeline from legislative data ingestion and cleaning to mapping-ready outputs and stakeholder-ready visual deliverables.

Selected Applied Research Projects

SurveyFluency: AI System for Survey Intelligence 2024 – Present

LLMs, NLP, Structured Data Reasoning

Developed LLM-based pipelines for analyzing structured survey data, including topline generation, demographic crosstabs, and pattern discovery.
Implemented natural-language querying over tabular survey data to support hypothesis-free exploration and rapid insight generation.
Evaluated system outputs using real-world datasets (e.g., General Social Survey) to assess robustness across survey waves and subpopulations.
Designed outputs to prioritize human interpretation, transparency, and researcher trust rather than automated decision-making.

PoliPrompt: LLM-Based NLP Classification Toolkit Jun – Sep 2024

Prompt Optimization, Transformers, PyTorch, Open Source

Developed a modular classification pipeline with prompt optimization and exemplar retrieval, achieving substantial accuracy gains with lower inference cost.
Integrated Hugging Face Transformers and PyTorch for scalable training and evaluation across large text corpora.
Released as an open-source PyPI package.

LLM-Based Analysis of Survey Open-Ended Responses May 2025 – Present

Prompting, Topic Modeling, Interpretability

Built pipelines analyzing 50,000+ open-ended responses for effort, emotion, moral content, and ideological stance using multiple prompting strategies.
Applied topic modeling and clustering to extract latent dimensions related to ambivalence, cross-pressure, and cognitive complexity.
Validated outputs against human-coded benchmarks to ensure measurement reliability and interpretability.

Data Processing for Housing Policy Debates June 2025 – Present

ASR, Diarization, NLP Pipelines, CI/CD

Automated transcription and speaker diarization for 500+ hours of local government meeting audio; built reproducible pipelines for extraction and analysis.
Developed CI/CD-ready processing workflows for cleaning, feature extraction, and downstream NLP tasks, reducing manual processing time substantially.
Produced structured quantitative measures of policy discourse for non-technical research partners and decision-makers.

Technical Skills

Programming

Python, R, SQL, Stata

NLP / ML Libraries

Hugging Face Transformers, PyTorch, scikit-learn, SpaCy, NLTK, Gensim, TensorFlow

Data & Model Management

DVC, Git, Weights & Biases, MLflow

Cloud / Deployment

AWS (EC2/S3), CI/CD workflows

Methods

Causal inference, clustering, regression modeling, survey analysis

Curriculum Vitae