Skip to content

Conversation

@slittyjuice-source
Copy link

Description

Quickstart

  • Computer Use Demo
  • Customer Support Agent
  • Financial Data Analyst
  • N/A

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Code refactoring
  • Other (please describe):

Testing

  • Added/updated unit tests
  • Tested manually
  • Verified in development environment

Screenshots

Additional Notes

ItsBarryZ and others added 30 commits July 7, 2025 13:52
Implements configurable bash command execution tool with granular permission control.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Implemented Extended Thinking tool with 4x/8x/16x/32x layer architectures
- Added logic prioritization (75% weight to logic layers vs consensus voting)
- Created comprehensive scalability analysis and architecture docs
- Updated Watson Glaser Advanced TIS with learner/developer views
- Added persistence, curriculum learning, and neural evolution
- Created Puppeteer tests and validation tools
- Added .gitignore to exclude node_modules
…e TIS

- Add detailed README with quick start, architecture, features
- Add MIT LICENSE for standalone deployment
- Add CONTRIBUTING guidelines (400+ lines)
- Add SECURITY policy with vulnerability disclosure
- Add CHANGELOG with v1.0.0 release notes
- Add INSTALL guide with deployment options
- Backup old README to README_OLD.md

This creates a self-contained, GitHub-compliant system ready for
independent deployment meeting all repository standards.
…e TIS

- Add detailed README with quick start, architecture, features
- Add MIT LICENSE for standalone deployment
- Add CONTRIBUTING guidelines (400+ lines)
- Add SECURITY policy with vulnerability disclosure
- Add CHANGELOG with v1.0.0 release notes
- Add INSTALL guide with deployment options
- Backup old README to README_OLD.md

This creates a self-contained, GitHub-compliant system ready for
independent deployment meeting all repository standards.
- Add package.json with npm scripts and dependencies
- Add .gitignore for node_modules and temporary files
- Add .gitkeep to preserve test screenshots directory
- Enable npm test, npm start, npm run dev commands
- Add comprehensive deployment verification script
- Update all placeholder URLs to actual GitHub username
- Make verify-deployment.sh executable
- System now passes all deployment readiness checks

The watson-glaser-tis-standalone branch is now fully compliant and
ready for independent deployment to GitHub Pages, Netlify, or Vercel.
- Complete step-by-step deployment instructions
- Troubleshooting for common issues
- Alternative deployment options (Netlify, Vercel)
- Post-deployment verification checklist
- Custom domain configuration guide
- Resolve INSTALL.md conflict (use standalone URLs)
- Add deployment infrastructure
- Add verification script
- Add package.json for npm commands

This brings TIS standalone features and GitHub compliance
documentation into the main branch.
…ks, imports, compiler options, formatting, linting, and testing.
…th only, improve agent tool handling flexibility, and refine history token tracking.
Added troubleshooting documentation for Puppeteer tests on macOS and updated puppeteer_test.js to use additional Chrome flags and support a custom executable path via PUPPETEER_EXECUTABLE_PATH. Also removed test_git_visibility.txt. These changes improve test reliability across different environments, especially on macOS with Rosetta.

Committ toWG test repo
slittyjuice-source and others added 27 commits December 4, 2025 15:08
- pyproject.toml: fix venvPath to . and venv to .venv-1
- pyproject.toml: add .venv-1 and .venv2 to norecursedirs
- .vscode/settings.json: add agents to pytestArgs, set defaultInterpreterPath
- agents/logic/grounding.py: remove unused Tuple import
- agents/logic/reasoning_agent.py: remove unused ValidationResult import
- agents/logic/reasoning_agent.py: rename property -> predicate to avoid shadowing
- agents/logic/reasoning_agent.py: improve modus ponens parsing

All 57 Python tests pass. All 36 WG Test integration tests pass.
…icacy

[WIP] Review code efficacy for improvement opportunities
Resolved a Python f-string formatting error in agents/extended_thinking_integration.ipynb by separating percentage and alignment formatting, ensuring correct output display. Added ISSUES_RESOLVED.md to document fixes and test improvements. Updated watson-glaser-trainer/package.json test scripts to make Puppeteer tests opt-in, and upgraded Puppeteer and related dependencies in package-lock.json for better compatibility.
- semantic_parser.py: Remove unused Tuple and Any imports
- knowledge_base.py: Replace MD5 with SHA256 for fact_id and cache_key

All 74 tests passing. Codacy analysis clean.
- critic_system.py: Remove corrupted text causing syntax errors
- memory_system.py: Rename 'id' parameters to 'entry_id' to avoid shadowing built-in

All 74 tests passing. Codacy analysis clean.
Introduces a comprehensive agent architecture with visual diagrams, a pluggable decision model supporting utility scoring, citation requirements, constraint handling, risk bands, and safe fallback options. Adds core modules for constraint systems, evidence validation, inference engine, memory persistence, retrieval augmentation, safety system, and related tests. Updates documentation to reflect roadmap and future integration plans.
Implemented comprehensive agent enhancements:

Core Modules Created:
- benchmark_suite.py: Performance benchmarking with regression detection
- calibration_system.py: Confidence calibration with Platt/temperature scaling
- clarification_system.py: Ambiguity detection and clarifying questions
- debate_system.py: Structured arguments and multi-agent debate
- feedback_system.py: Decision outcome tracking and weight adjustment
- latency_control.py: Circuit breakers and adaptive timeouts
- ui_hooks.py: Event bus, progress tracking, streaming support

Enhanced Modules:
- constraint_system.py: Hard/soft constraints with relaxation paths
- retrieval_augmentation.py: Semantic chunking, re-ranking, query expansion
- decision_model.py: Utility function U = value - cost - risk
- planning_system.py: Plan verification and effect tracking

Tests:
- 111 tests passing (up from 78)
- New test_phase2_enhancements.py with 29 tests
- Added verification and evidence tests

All modules validated with Codacy CLI - clean results
Add 9 new core modules implementing sophisticated agent capabilities:

- multimodal_pipeline.py: CLIP/VLM-style vision+text fusion with modality
  alignment, contrastive learning hooks, and embedding composition

- fuzzy_inference.py: Complete fuzzy logic inference engine with membership
  functions (triangular, trapezoidal, gaussian, sigmoid), FuzzyVariable,
  FuzzyRule, and defuzzification methods (centroid, bisector, mean of max)

- self_consistency.py: Multi-sample self-consistency with voting methods
  (majority, weighted, unanimous), verification chains, answer normalization,
  and consistency result aggregation

- tool_arbitration.py: Tool selection based on usage statistics, semantic
  fit, cost modeling, and reliability tracking

- retrieval_diversity.py: BM25 + vector + reranking hybrid retrieval with
  MMR diversity, rank fusion (RRF), and result deduplication

- source_trust.py: Source reliability modeling with temporal decay, accuracy
  tracking, trust propagation, and confidence calibration

- hallucination_mitigation.py: Proof-or-flag pattern with citation validation,
  claim verification, and unverifiable claim detection

- adversarial_testing.py: Jailbreak detection, prompt injection detection,
  threat pattern library, input sanitization, and vulnerability assessment

- telemetry_replay.py: Session logging, replay capabilities, performance
  metrics, and debugging utilities

Test coverage: 47 new tests (all passing)
Optimize API token usage across 4 core modules:

self_consistency.py:
- Add result caching with TTL for SelfConsistencyVoter
- Cache key computed from chain fingerprints
- Early termination threshold for high-confidence results
- cache_stats() method for monitoring

tool_arbitration.py:
- Add selection caching for deterministic strategies (GREEDY, UCB)
- Cache key from context + candidates + capabilities
- clear_cache() and cache_stats() for monitoring

retrieval_diversity.py:
- Add query result caching to BM25Retriever
- LRU eviction when cache_size exceeded
- Cache invalidation on index rebuild
- cache_stats() for monitoring

hallucination_mitigation.py:
- Add verification result caching to ClaimVerifier
- Early termination when confidence threshold reached
- Cache by claim text hash

Test coverage: 51 tests (4 new caching tests)
Introduces a deterministic logic foundation with new core_logic modules for propositional and categorical logic (logic_engine, categorical_engine), and updates the planning system with safer calculation, tool scoring, and new reranker and trace logger utilities. Adds comprehensive tests for new logic and agent components, enhances the GitHub Actions workflow for coverage and test reporting, and removes obsolete test files.
…nd memory systems

- Fix sentence extraction in CriticSystem.self_consistency_check to handle trailing periods
- Fix SQLiteBackend._row_to_entry to use direct column access instead of .get()
- Fix JSONFileBackend._load to handle empty files gracefully
- Update test_critic_system.py with correct API usage and relaxed assertions
- Update test_inference_engine.py with correct add_rule signature (name, antecedents, consequent)
- Update test_memory_persistence.py to use 'content' field required by SQLiteBackend

All 246 tests now pass.
…easoning

- Create agents/core/logic_orchestrator.py with:
  - LogicOrchestrator class coordinating all logic engines
  - StructuredArgument dataclass for argument input
  - LogicAnalysisResult dataclass for unified output
  - ArgumentType enum for routing classification
  - Stubbed methods for categorical, propositional, and mixed analysis
  - Fallacy detection integration via FallacyDetector
  - Factory functions for convenience

- Add 20 unit tests in agents/tests/test_logic_orchestrator.py
- Add plan-masterDevelopment.prompt.md for project documentation

Architecture: LogicOrchestrator is the ONLY module that coordinates
multiple logic engines. It remains deterministic (no LLM calls).
…mental environment)

- Created self-contained project clone under experiments/
- Isolated virtual environment with pinned dependencies
- Sandbox configuration with agent constraints (.sandbox_config.yaml)
- Documentation of experiment scope and permissions (EXPERIMENT.md)
- Modified pyproject.toml to remove coverage requirements for sandbox
- Test baseline: 311 passed, 12 failed, 16 skipped

This sandbox is isolated from core codebase and safe for experimental
development by autonomous agents.
- ARCHITECTURE_AUDIT.md, ARCHITECTURE_IMPLEMENTATION_SUMMARY.md, ARCHITECTURE_METAPHYSICS.md
- agents/core/architectural_layer.py
- 7 new test files for architectural compliance, debate, evidence, robustness, rules, uncertainty systems
Introduces minimal classes and facades for core modules (categorical_engine, clarification_system, constraint_system, curriculum_system, debate_system, decision_model, evidence_system, logic_orchestrator, observability_system, role_system, rule_engine, semantic_parser, uncertainty_system) to support unit and integration tests. Also improves logic evaluation in logic_engine and updates notebook formatting for extended thinking integration.
- Remove unused imports across 40+ files (F401)
- Fix f-strings without placeholders (F541)
- Rename ambiguous variable 'l' to 'layer_data'/'dep' (E741)
- Convert lambda to def for comparator function (E731)
- Add bare except -> except Exception (E722)
- Rename duplicate class definitions to avoid F811:
  - EvidenceValidator -> FullEvidenceValidator
  - ConflictResolver -> FullConflictResolver
  - Remove duplicate lightweight classes in clarification/debate systems
- Configure pyproject.toml to ignore E402/F821 for notebooks
- Prefix unused local variables with underscore (F841)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants