Can we trust what we read online?
Can AI help spot fake news and explain why?
These questions inspired the creation of Veritas, a full-stack ML-powered platform with risk assessment, pattern analysis, and explainable AI for news verification using Text and Metadata.
- Overview
- Key Features
- Live Frontend & API Access
- How It Works
- Technical Stack
- Use Cases
- Model Evaluation Result
- Exploratory Data Analysis (EDA) Summary
- Important Disclaimer
- Repo Structure
- Getting Started
- Docker Deployment
- Contributing
- License
Veritas isn’t just a classifier it’s a decision support tool, helps users think critically, not just react, by showing why a news item is flagged. It reveals how language can persuade, mislead, or inform.
- ML-Powered Detection: Advanced classification using trained models to identify fake vs. real news
- SHAP Explainability: Transparent decision-making with feature importance rankings
- Pattern Recognition: Analyzes writing style, linguistic patterns, and content structure
- Lorem Ipsum Detection: Catches placeholder text before expensive ML processing
- Character Repetition Filtering: Identifies suspicious repetitive patterns
- Special Character Validation: Monitors unusual character usage ratios
- Early Issue Detection: Prevents obvious problems from reaching the ML model
- Severity Levels: High, Medium, Low risk categorization
- Confidence Scoring: Numerical confidence ratings (0-100%)
- Reliability Metrics: Comprehensive credibility scoring system
- Risk Indicators: Identifies specific patterns that raise concerns
- SHAP Summary Plots: Visual feature impact analysis
- Text Pattern Charts: Graphical representation of writing patterns
- Decision Factors: Clear visualization of prediction reasoning
- Feature Analysis: Detailed breakdown of model decision factors
- Text Statistics: Word count, sentence analysis, readability scores
- Content Metrics: ALL CAPS usage, punctuation patterns, URL detection
- Export Capabilities: Save analysis results for further review
- History Tracking: Monitor analysis patterns over time
Access the interactive web application here: https://veritas-news-credibility-analyzer.streamlit.app
Paste or upload a news article and receive:
- Credibility prediction (Real or Fake)
- Risk level and confidence score
- SHAP-based explanation and key features
Access the API at: https://veritas-news-credibility-analyzer.onrender.com
| Method | Endpoint | Description |
|---|---|---|
| POST | /predict |
Returns prediction, confidence, SHAP values |
| GET | /health |
Check if the API service is running |
- Input Processing: Text is analyzed for basic patterns and defensive checks
- Feature Extraction: Advanced linguistic and structural features are computed
- ML Prediction: Trained model classifies content as real or fake news
- SHAP Analysis: Explains which features influenced the prediction
- Risk Assessment: Evaluates overall credibility and assigns risk levels
- Visualization: Presents results through interactive charts and summaries
- Frontend: Streamlit for interactive web interface
- Backend: FastAPI for API and documentation
- ML Framework: Scikit-learn for model training and prediction
- Explainability: SHAP for transparent AI decision explanations
- Visualization: Matplotlib,Seaborn, Plotly for interactive charts
- Text Processing: NLTK for linguistic analysis
- Journalists: Verify source credibility and fact-check articles
- Researchers: Study misinformation patterns and linguistic indicators
- Educators: Teach media literacy and critical thinking skills
- General Users: Evaluate news authenticity before sharing
To assess model performance, we evaluated it on a held-out test set using several classification metrics.
-
Confusion Matrix: Shows the distribution of true vs. predicted classes
-
Precision-Recall Curve: Highlights model performance on imbalanced data
These plots help assess the model’s capability to distinguish between fake and real news, especially under imbalanced class scenarios. For fair evaluation, we prioritized metrics like F1-score and AUC-PR over just accuracy.
A comprehensive EDA was conducted on the misinformation dataset to uncover patterns and insights critical to model development. The key findings include:
- Class Distribution: Dataset imbalance with more real news than fake, requiring stratified sampling and careful metric selection.
- Subject Distribution: Perfect correlation with target label presents data leakage risk; excluded from model training.
- Title Length: Fake news titles are longer and more variable, often sensational or verbose.
- Text Length: Fake news articles tend to be longer and highly variable; real news is more concise.
- Punctuation Usage: Exclamation and question marks occur more frequently in fake news.
- Uppercase Words: Fake news contains more uppercase words for emphasis or sensationalism.
- Temporal Patterns: Fake news spikes on weekends; real news is more evenly distributed.
- Word Clouds: Real news uses institutional and factual language; fake news uses emotional and subjective terms.
- Top Unigrams & Bigrams: Distinct vocabulary reflecting formal reporting in real news and sensationalism in fake news.
These insights informed feature engineering and helped improve model interpretability and fairness.
For the full detailed report, see: EDA_Report.md
Veritas is designed as a supplementary tool for news analysis. It provides guidance based on writing patterns and linguistic features, not absolute truth determination. Users should:
- Always verify information through multiple reliable sources
- Consider context and domain expertise
- Use critical thinking alongside automated analysis
- Understand that no AI system is infallible
misinfo-detector/
├── .gitignore # Git ignore file
├── app.py # Main streamlit application
├── Dockerfile # Docker configuration
├── .dockerignore # Docker ignore files
├── .env_example # Example environment variables
├── LICENSE # MIT License
├── output.png # Output visualization
├── README.md # Project documentation
├── requirements.txt # Python dependencies
├── api/ # API-related files
│ ├── dependencies.py # API dependencies
│ ├── main.py # API main entry point
│ └── schema/ # API schema definitions
├── data/ # Data directory
│ ├── processed/ # Processed datasets
│ └── raw/ # Raw datasets (True.csv, Fake.csv)
├── model/ # Model artifacts
│ ├── __init__.py # Package initialization
│ ├── best_params.pkl # Best hyperparameters
│ └── model_pipeline.pkl # Model pipeline
├── notebooks/ # Jupyter notebooks
│ ├── eda.ipynb # Exploratory Data Analysis
│ ├── modeling.ipynb # Model training and evaluation
│ └── eda.ipynb # Additional EDA
├── reports/ # Analysis reports and figures
│ ├── figures/ # EDA visualizations
| ├── evaluatin_metrics/ # evaluation metirices fig
| └── eda_report.md # eda reort
├── utils/ # Utility functions
│ ├── model_utils.py # Model handling utilities
│ ├── predict_output.py # Prediction utilities
│ ├── nltk_config.py # config nltk resources
│ ├── nltk_setup.py # download nltk setup
│ └── preprocessing.py # Text preprocessing utilities
To run the Veritas News Credibility Analyzer locally, follow these steps:
- Clone the repository:
# Clone the repository
git clone https://github.com/kushalregmi61/veritas-news-analyzer.git
# Navigate into the project directory
cd veritas-news-analyzer- Setup Virtual Environment:
# Create a virtual environment (using venv)
# mac/linux
python3 -m venv .venv
source .venv/bin/activate
# windows
python -m venv .venv
.venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- set up environment variables:
# Copy the .env_example to .env file
# mac/linux
cp .env_example .env
# windows
copy .env_example .env
# Then, edit the .env file to set the API_URL if needed
# Example URL for local FastAPI Model Inference Service
API_URL=http://127.0.0.1:8000/predict - Run the application locally:
# 4.1 Start the FastAPI backend (in a separate terminal)
uvicorn api.main:app --host 127.0.0.1 --port 8000
# 4.2 Run the Streamlit app (in another terminal)
streamlit run app.pyThe app is containerized for reproducible deployment. Run the FastAPI backend in Docker by either pulling the published image or building locally.
- Pull the published image:
docker pull kushalregmi61/veritas-news-analyzer:latest- Run the container (maps container port 8000 to host port 8000):
docker run -p 8000:8000 kushalregmi61/veritas-news-analyzer:latest- Build the image from the project root:
docker build -t veritas-news-analyzer .- Run the container:
docker run -p 8000:8000 veritas-news-analyzer- FastAPI API docs: http://localhost:8000/docs
- If you run the Streamlit frontend locally (in another terminal), start it with:
streamlit run app.py- Ensure the Streamlit app’s API_URL (in .env or config) points to the running backend, e.g.:
API_URL=http://localhost:8000/predict
Contributions are welcome! Please refer to the contributing guidelines for:
- Bug reports and feature requests
- Code improvements and optimizations
- Documentation enhancements
- Model performance improvements
This project is licensed under the MIT License - see the LICENSE file for details.

