Skip to content

jieba-php v0.42 - Enhanced TF-IDF and Memory Management

Latest

Choose a tag to compare

@fukuball fukuball released this 19 Jul 05:55
· 18 commits to master since this release

🚀 jieba-php v0.42 - Major Feature Release

This release introduces significant enhancements to jieba-php with new memory management capabilities, integrated TF-IDF scoring, and improved multi-language support.

✨ New Features

🧠 JiebaMemory Class - Unified Memory Management

  • NEW: JiebaMemory class for centralized memory management across all jieba-php classes
  • Memory Statistics: Comprehensive memory usage monitoring with getMemoryStats() and getAllCacheStats()
  • Batch Operations: initAll(), destroyAll(), and clearAllCaches() for efficient resource management
  • Status Monitoring: getInitializationStatus() and isAllInitialized() for system health checks

📊 Enhanced TF-IDF and POS Integration

  • NEW: Jieba::cut() now supports with_pos and with_scores options
  • NEW: Posseg::cut() now supports with_scores option for TF-IDF integration
  • Modular API: New JiebaAnalyse::calculateTF() and calculateTFIDF() methods
  • Backward Compatibility: All existing APIs remain unchanged
  • Auto-initialization: JiebaAnalyse automatically initializes when scoring features are used

🌍 Improved Multi-language CJK Support

  • Enhanced: Better handling of mixed Chinese/Japanese/Korean text
  • Complex Scenarios: Improved processing of mixed-language documents
  • Performance: Optimized CJK character recognition and segmentation

🛠️ Demo Scripts & Examples

  • NEW: demo_tf_idf_pos.php - TF-IDF and POS tagging integration examples
  • NEW: demo_mixed_cjk.php - Multi-language CJK text processing examples

🧪 Enhanced Testing

  • NEW: TfIdfPosTest.php - Comprehensive TF-IDF integration testing
  • NEW: MixedCJKTest.php - Multi-language text processing validation
  • Coverage: 70+ tests with 300+ assertions
  • Backward Compatibility: Full validation of existing API compatibility

📚 Documentation Updates

  • Comprehensive: Updated README.md and CLAUDE.md with all new features
  • Best Practices: Memory management guidelines and performance optimization tips
  • API Examples: Detailed usage examples for all new features
  • Multi-language: Complete documentation in both Chinese and English

🔧 Technical Improvements

  • Security: Enhanced input validation and injection prevention
  • Performance: Optimized memory usage and cache management
  • Reliability: Improved error handling and graceful degradation
  • Compatibility: Maintained backward compatibility with all existing code

📦 Installation & Usage

composer require fukuball/jieba-php:^0.42
use Fukuball\Jieba\JiebaMemory;

// Quick start with unified memory management
JiebaMemory::initAll();

// Enhanced segmentation with POS and TF-IDF
$result = Jieba::cut($text, false, array(
    'with_pos' => true,
    'with_scores' => true
));

// Memory monitoring
$stats = JiebaMemory::getMemoryStats();
echo "Memory usage: " . $stats['current_memory_usage_formatted'];

🙏 Acknowledgments

Special thanks to all contributors and the community for their feedback and suggestions that made this release possible.


Full Changelog: 0.34...0.42