GenAI For Good
About the project
The proliferation of fake news and misinformation, often amplified by large language models (LLMs), poses a significant threat to societal trust and stability. This paper introduces a hybrid veracity detection and scoring framework that leverages both generative AI and traditional machine learning to detect, rank, and mitigate misinformation and disinformation across diverse media formats. Our approach decomposes content into structured analytical components, using an ensemble of factuality factors such as frequency heuristics, malicious account indicators, and psychological manipulation cues to identify and assess deceptive patterns. By employing advanced techniques such as Retrieval-Augmented Generation (RAG), fractal chain-of-thought prompting, and function calling, our system dynamically refines predictions, enhancing transparency and reducing hallucinations. This hybridized LLM-based veracity machine not only facilitates precise misinformation detection but also provides a scalable and interpretable solution for managing the complexities of content veracity in an evolving digital landscape.
Codebase Report 📑 Poster 📈 Demo ▶️
Introduction
In today’s digital era, the rapid spread of misinformation and disinformation poses a significant societal challenge. Enabled by the rise of advanced technologies such as large language models and artificial intelligence tools, these phenomena undermine mutual trust and can have serious consequences on democratic processes and public safety. Individuals and entities can now easily create and disseminate unchecked information, reaching vast audiences at an alarming rate. This ease of spreading falsehoods not only threatens social harmony but also necessitates an urgent call for effective detection, evaluation, and mitigation strategies. This project aims to explore the growing impact of digital misinformation and disinformation, highlighting how emerging technologies facilitate their spread. It will also propose new solutions to enhance the resilience of information ecosystems against the onslaught of digital falsehoods.
Why is our project unique?
Our veracity engine uses a suite of latest tools and techniques to power the analysis. This includes:
- Factuality Factors: a detailed breakdown of of various directions of misinformation. This could include obvious political tendencies or covert agendas intending to misguide the readers toward a specific argument.
- Fractal Chain of Thought (FCoT): an iterative Chain of Thought (CoT) prompting method that allows an LLM to score veracity based a rigorously defined objective function. Self reflect and updates its scoring based on workflow steps to reach a more conclusive result.
- Mesop UI Package: new framework launched by Google in 2024 for easy chatbot/LLM application setup through Python.
- Gemini 2.0 flash: the most advanced LLM model by the time of showcase on multiple benchmarks.
- Hybrid between Predictive and Generative AI: the predictive classification using our trained metrics, based on the LiarPlus dataset, provides a concrete, empirical reference for the generative to build upon. Reduces hallucination.
- Retrieval-augmented Generation (RAG) + Google Search + Function Calling: suite of proven methods that enhance reasoning and fact-checking capability for specific domains.
Roadmap

Datasets
Liar PLUS: Integration with Predictive AI
-
Feature Extraction: The predictive AI can analyze the text data from LIAR-PLUS to extract features that are relevant to determining the truthfulness of statements. This could include linguistic features (like the complexity of language used), semantic features (like the presence of certain key phrases or concepts), and metadata (like the speaker’s profile).
-
Contextual Analysis: The detailed justifications provided in LIAR-PLUS allow the predictive AI to learn not just whether a statement is true or false, but why it was categorized as such. This deepens the model’s understanding, enabling it to better handle nuanced or borderline cases.
-
Training Data: LIAR-PLUS serves as training data for the predictive AI (random forest model). The rich, annotated dataset helps in building robust models that are trained on both the statement and its contextual backing, improving the accuracy of predictions.
Politifact & Snopes: Integration with Generative AI
-
Use of External Datasets: We enhance the performance and reliability of our veracity machine by leveraging external datasets extracted through web scraping techniques. We obtain data from platforms like PolitiFact and Snopes.com, which provide extensive information on the latest news stories and their truthfulness, as assessed by expert fact-checkers.
-
Integration of Ground Truth Labels: These datasets offer crucial ground truth labels, including “True,” “Half-True,” and “False,” along with detailed explanations that elucidate the rationale behind these assessments. By integrating these expert-verified annotations, we ensure that our model is trained and evaluated against high-quality, up-to-date information.
-
Expansion of Data Extraction: We are currently expanding our approach to include not only the truthfulness labels but also the accompanying explanations for each verdict. These explanations provide essential context for understanding why content was classified in a particular way, delivering valuable insights into the underlying reasoning.
-
Incorporation into System Prompts: Our plan is to incorporate these explanations into the system’s prompts, allowing the AI to generate more informed and contextually relevant outputs. This enhancement will enable the veracity machine to provide users not only with accurate classifications but also the reasoning behind these classifications.
-
Enhancing Transparency and Trust: By iteratively refining this approach, we aim to foster greater transparency and user trust. Our goal is to create a system capable of addressing the complexities of misinformation with both accuracy and depth.
Methodology
Predictive AI
Combines traditional predictive AI models for statistical rigor and generative AI for nuanced content analysis. Anchors analysis with structured factuality scoring.
-
Dataset: LiarPlus, fact-checking and fake news detection dataset
-
Factuality features: Location, Education, Event coverage, Echo chamber, News coverage, Malicious account
-
Trained Model: Random forest classifier
-
Output labels: True, mostly-true, half-true, barely-true, false, pants-fire
Generative AI
- Factuality Factors: Content veracity assessed through multi-dimensional factors, enableing precise and transparent decomposition of misinformation. Including:
- Frequency Heuristics (repetition patterns, origin tracing).
- Malicious Accounts (bot-like behavior, false content analysis).
- Psychological Manipulation (emotional cues, echo chamber effects).
- Retrieval-Augmented Generation (RAG)
- Fuses semantic retrieval and LLM generation.
- Ensures grounding of outputs with verified facts and metadata.
- Dynamically adjusts based on source credibility and temporal relevance.
-
Fractal Chain of Thought (FCOT) Prompting: Advances traditional chain-of-thought prompting with iterative, layered analysis:
- Evaluates factuality factors in multiple iterations.
- Incorporates feedback loops for refined insights and improved veracity scoring.
Comparison to Traditional Prompting:
Traditional: One-off evaluations, limited depth.
FCOT: Recursive, multi-factor, transparent reasoning.
Use factuality factors as objective functions.
Update score at each iteration with the usage of function calling.
- SerpAPI Web Search
By embedding these real-time search results into the prompt, the GenAI gains access to a broader and more dynamic set of data, enabling it to cross-reference claims made in the inputted news article with credible external sources.
- Function Calling
Function calls are strategically used to dynamically adjust analysis parameters based on real-time feedback. This adaptability is essential for calculating the effectiveness of various thought patterns generated by our algorithm, ensuring that the most logical and factually consistent chains are prioritized.
Results
Prediction / Generative / Hybrid
Predictive Performance on Liar PLUS Dataset:
Model Description | Score (%) |
---|---|
BERT Embedding Model | 43.7 |
XGBoost/LightGBM (Boosting algorithm) | 33.1 |
Random Forest Classifier (Bagging algorithm) | 67.8 |
Sentiment Analysis (TF-IDF) | 45.9 |
Word2Vec | 55.2 |
Table 1. Predictive Performance on Liar PLUS dataset
Overall Model Performance:
Model Description | Score (%) |
---|---|
Baseline (Feeding straight into Gemini Flash 2.0) | 19 |
Hybrid (Random Forest + Gemini) | 34.3 |
Hybrid + RAG | 40 |
Hybrid + RAG + Web Search | 56.9 |
Hybrid + RAG + Web Search + FCOT Prompting | 67.2 |
Hybrid (50/50) + RAG + Web Search + FCOT Prompting + Function Calling | 65.3 |
Hybrid (70/30) + RAG + Web Search + FCOT Prompting + Function Calling | 85.1 |
Table 2. Overall Model Performance
Precision & Recall Result:

Table 3. predictive vs generative vs hybrid accuracy
Prompting Comparison Result: Prompting Comparison Link
Prompts Constructed
Normal Prompting
Use 3 iterations to check the veracity score of this news article. Factors to consider are Frequency Heuristic and Misleading Intentions. In each, determine what you missed in the previous iteration. Also put the result from RAG into consideration/rerank. RAG: Here, out of six potential labels (true, mostly-true, half-true, barely-true, false, pants-fire), this is the truthfulness label predicted using a classifier model: {predict_score}. These are the top 100 related statement in LiarPLUS dataset that related to this news article: {get_top_100_statements(input_text)}. Provide a percentage score and explanation for each iteration and its microfactors. Final Evaluation: Return an exact numeric veracity score for the text, and provide a matching label out of these six [true, mostly-true, half-true, barely-true, false, pants-fire]
Chain of Thought: Chain of Thought Full Prompt Link
Fractal Chain of Thought: Fractal Chain of Thought Full Prompt Link
Further Discussion
-
Chunking the input news into smaller paragraphs. The chunking methods can help the GenAI to detect each paragraph in detail to further produce much accurate interpretation on the results.
-
Adding Langchain Agent and using more factuality factors to our model. Langchain Agent can help structuring our current model to learn and adapt better to the news and take time to fully analyze based on our factuality factors. It focuses on the data-centric aspects—like embeddings, chain logic, and integration with language models—to learn how to transform raw data into contextualized, conversational experiences. It also understands the end-to-end workflow—from defining user requirements to deploying robust AI solutions—and see how LangChain’s flexible architecture can fit seamlessly into diverse product strategies.
-
Adding more facuality factors can help the model analyze the news in multi-perspectives to produce more accurate score to counter misinformation.
-
Expanding our dataset and refine our algorithms to better handle the dynamic and evolving nature of online information. Future work will focus on automating the integration of real-time data feeds and enhancing the system’s adaptability to new and emerging types of misinformation.
Ethical Consideration
-
Bias and fairness in AI decision-making: AI models often inherit biases from their training data, which can lead to skewed misinformation classifications, particularly when dealing with politically or ideologically sensitive content. Addressed by ensuring the collected data are public and reliable. Using RAG process and metadata in the ChromaDB to further classify.
-
Users awareness: Users should always be aware of how their data is being used and have the choice to opt out when possible. To protect privacy, these systems must follow strict security measures such as encryption and anonymization, ensuring that personal information is not misused. Addressed by user using personal Google Gemini API for API key.
-
AI transparency: must be transparent the process in how AI detects misinformation. Understanding why content is flagged as false or misleading and providng clear reasons for their conclusions, allowing users to see and verify the logic behind content classification. Addressed by sharing the full prompt engineering and use of FCOT.