Skip to content

KushalRegmi61/veritas-news-credibility-analyzer

Repository files navigation

Veritas: News Credibility Analyzer

Can we trust what we read online?
Can AI help spot fake news and explain why?

These questions inspired the creation of Veritas, a full-stack ML-powered platform with risk assessment, pattern analysis, and explainable AI for news verification using Text and Metadata.


Table of Contents


Overview

Veritas isn’t just a classifier it’s a decision support tool, helps users think critically, not just react, by showing why a news item is flagged. It reveals how language can persuade, mislead, or inform.


Key Features

Intelligent Analysis

  • ML-Powered Detection: Advanced classification using trained models to identify fake vs. real news
  • SHAP Explainability: Transparent decision-making with feature importance rankings
  • Pattern Recognition: Analyzes writing style, linguistic patterns, and content structure

Defensive Programming

  • Lorem Ipsum Detection: Catches placeholder text before expensive ML processing
  • Character Repetition Filtering: Identifies suspicious repetitive patterns
  • Special Character Validation: Monitors unusual character usage ratios
  • Early Issue Detection: Prevents obvious problems from reaching the ML model

Risk Assessment

  • Severity Levels: High, Medium, Low risk categorization
  • Confidence Scoring: Numerical confidence ratings (0-100%)
  • Reliability Metrics: Comprehensive credibility scoring system
  • Risk Indicators: Identifies specific patterns that raise concerns

Interactive Visualizations

  • SHAP Summary Plots: Visual feature impact analysis
  • Text Pattern Charts: Graphical representation of writing patterns
  • Decision Factors: Clear visualization of prediction reasoning
  • Feature Analysis: Detailed breakdown of model decision factors

Professional Features

  • Text Statistics: Word count, sentence analysis, readability scores
  • Content Metrics: ALL CAPS usage, punctuation patterns, URL detection
  • Export Capabilities: Save analysis results for further review
  • History Tracking: Monitor analysis patterns over time

Live Frontend & API Access

Streamlit Frontend

Access the interactive web application here: https://veritas-news-credibility-analyzer.streamlit.app

Paste or upload a news article and receive:

  • Credibility prediction (Real or Fake)
  • Risk level and confidence score
  • SHAP-based explanation and key features

Demo

App Demo

FastAPI Backend

Access the API at: https://veritas-news-credibility-analyzer.onrender.com

Method Endpoint Description
POST /predict Returns prediction, confidence, SHAP values
GET /health Check if the API service is running

Demo

App Demo


How It Works

  1. Input Processing: Text is analyzed for basic patterns and defensive checks
  2. Feature Extraction: Advanced linguistic and structural features are computed
  3. ML Prediction: Trained model classifies content as real or fake news
  4. SHAP Analysis: Explains which features influenced the prediction
  5. Risk Assessment: Evaluates overall credibility and assigns risk levels
  6. Visualization: Presents results through interactive charts and summaries

Technical Stack

  • Frontend: Streamlit for interactive web interface
  • Backend: FastAPI for API and documentation
  • ML Framework: Scikit-learn for model training and prediction
  • Explainability: SHAP for transparent AI decision explanations
  • Visualization: Matplotlib,Seaborn, Plotly for interactive charts
  • Text Processing: NLTK for linguistic analysis

Use Cases

  • Journalists: Verify source credibility and fact-check articles
  • Researchers: Study misinformation patterns and linguistic indicators
  • Educators: Teach media literacy and critical thinking skills
  • General Users: Evaluate news authenticity before sharing

Evaluation Results

To assess model performance, we evaluated it on a held-out test set using several classification metrics.

  • Confusion Matrix: Shows the distribution of true vs. predicted classes

  • Precision-Recall Curve: Highlights model performance on imbalanced data

These plots help assess the model’s capability to distinguish between fake and real news, especially under imbalanced class scenarios. For fair evaluation, we prioritized metrics like F1-score and AUC-PR over just accuracy.


Exploratory Data Analysis (EDA) Summary

A comprehensive EDA was conducted on the misinformation dataset to uncover patterns and insights critical to model development. The key findings include:

  1. Class Distribution: Dataset imbalance with more real news than fake, requiring stratified sampling and careful metric selection.
  2. Subject Distribution: Perfect correlation with target label presents data leakage risk; excluded from model training.
  3. Title Length: Fake news titles are longer and more variable, often sensational or verbose.
  4. Text Length: Fake news articles tend to be longer and highly variable; real news is more concise.
  5. Punctuation Usage: Exclamation and question marks occur more frequently in fake news.
  6. Uppercase Words: Fake news contains more uppercase words for emphasis or sensationalism.
  7. Temporal Patterns: Fake news spikes on weekends; real news is more evenly distributed.
  8. Word Clouds: Real news uses institutional and factual language; fake news uses emotional and subjective terms.
  9. Top Unigrams & Bigrams: Distinct vocabulary reflecting formal reporting in real news and sensationalism in fake news.

These insights informed feature engineering and helped improve model interpretability and fairness.

For the full detailed report, see: EDA_Report.md


Important Disclaimer

Veritas is designed as a supplementary tool for news analysis. It provides guidance based on writing patterns and linguistic features, not absolute truth determination. Users should:

  • Always verify information through multiple reliable sources
  • Consider context and domain expertise
  • Use critical thinking alongside automated analysis
  • Understand that no AI system is infallible

Repo Structure

misinfo-detector/
├── .gitignore                     # Git ignore file
├── app.py                         # Main streamlit application
├── Dockerfile                     # Docker configuration
├── .dockerignore                  # Docker ignore files
├── .env_example                   # Example environment variables
├── LICENSE                        # MIT License
├── output.png                     # Output visualization
├── README.md                      # Project documentation
├── requirements.txt               # Python dependencies
├── api/                           # API-related files
│   ├── dependencies.py            # API dependencies
│   ├── main.py                    # API main entry point
│   └── schema/                    # API schema definitions
├── data/                          # Data directory
│   ├── processed/                 # Processed datasets
│   └── raw/                       # Raw datasets (True.csv, Fake.csv)
├── model/                         # Model artifacts
│   ├── __init__.py                # Package initialization
│   ├── best_params.pkl            # Best hyperparameters
│   └──  model_pipeline.pkl        # Model pipeline
├── notebooks/                     # Jupyter notebooks
│   ├── eda.ipynb                  # Exploratory Data Analysis
│   ├── modeling.ipynb             # Model training and evaluation
│   └── eda.ipynb                  # Additional EDA
├── reports/                       # Analysis reports and figures
│   ├── figures/                   # EDA visualizations
|   ├── evaluatin_metrics/         # evaluation metirices fig 
|   └── eda_report.md              # eda reort 
├── utils/                         # Utility functions
│   ├── model_utils.py             # Model handling utilities
│   ├── predict_output.py          # Prediction utilities
│   ├── nltk_config.py             # config nltk resources
│   ├── nltk_setup.py              # download nltk setup
│   └── preprocessing.py           # Text preprocessing utilities

Getting Started

To run the Veritas News Credibility Analyzer locally, follow these steps:

  1. Clone the repository:
# Clone the repository
git clone https://github.com/kushalregmi61/veritas-news-analyzer.git

# Navigate into the project directory
cd veritas-news-analyzer
  1. Setup Virtual Environment:
# Create a virtual environment (using venv)

# mac/linux
python3 -m venv .venv
source .venv/bin/activate

# windows
python -m venv .venv
.venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. set up environment variables:
# Copy the .env_example to .env file

# mac/linux
cp .env_example .env

# windows
copy .env_example .env

# Then, edit the .env file to set the API_URL if needed

# Example URL for local FastAPI Model Inference Service   
API_URL=http://127.0.0.1:8000/predict 
  1. Run the application locally:
# 4.1 Start the FastAPI backend (in a separate terminal)
uvicorn api.main:app --host 127.0.0.1 --port 8000
# 4.2 Run the Streamlit app (in another terminal)
streamlit run app.py

Docker Deployment

The app is containerized for reproducible deployment. Run the FastAPI backend in Docker by either pulling the published image or building locally.

Option A: Pull and run (quick start)

  1. Pull the published image:
docker pull kushalregmi61/veritas-news-analyzer:latest
  1. Run the container (maps container port 8000 to host port 8000):
docker run -p 8000:8000 kushalregmi61/veritas-news-analyzer:latest

Option B:Build and run locally

  1. Build the image from the project root:
docker build -t veritas-news-analyzer .
  1. Run the container:
docker run -p 8000:8000 veritas-news-analyzer

Accessing the services

streamlit run app.py
  • Ensure the Streamlit app’s API_URL (in .env or config) points to the running backend, e.g.:
API_URL=http://localhost:8000/predict

Contributing

Contributions are welcome! Please refer to the contributing guidelines for:

  • Bug reports and feature requests
  • Code improvements and optimizations
  • Documentation enhancements
  • Model performance improvements

License

This project is licensed under the MIT License - see the LICENSE file for details.