Skip to content

Conversation

@astariul
Copy link

@astariul astariul commented Jun 4, 2019

To fix #68, ROUGE-N metric is added to Rouge.

Taken from : https://github.com/pltrdy/seq2seq/blob/master/seq2seq/metrics/rouge.py

Results might be slightly different from official ROUGE-155 script, but at least code is very simple.

@msftclas
Copy link

msftclas commented Jun 4, 2019

CLA assistant check
All CLA requirements met.

@temporaer
Copy link
Member

Hey, thanks for the PR! It looks like there's an issue with the tests still:
image

self.assertAlmostEqual(0.626149, scores['SkipThoughtCS'], places=5)
self.assertAlmostEqual(0.88469, scores['EmbeddingAverageCosineSimilairty'], places=5)
self.assertAlmostEqual(0.568696, scores['VectorExtremaCosineSimilarity'], places=5)
self.assertAlmostEqual(0.784205, scores['GreedyMatchingScore'], places=5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!
Would you add some tests for the value of the ROUGE metrics?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I will not :

  • The code added is from another repository, not my code
  • The score are slightly different with pyrouge.
  • The goal of this PR is to give a quick and approximate way to get ROUGE-N score. This should not be merged to main branch, but kept open here.
  • For a real ROUGE-N score, someone needs to add the official perl script ROUGE-155... I don't have time for this now :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other repo's code seems to be Apache-licensed, I'm not sure if we can merge it, particularly without including their license. I'm not too worried about slightly different values as long as we're clear about the methods used in the docs. Are you aware where differences might come from?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could at least test that the values are within some reasonable bounds.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried comparing the results with the results using the rouge package. But scores are different (and not only a few digits...). Not only ROUGE-N are different, but also the existing ROUGE-L.

This package is also Apache-licensed. Not sure if we can just use it without including their license (without modifying their code).

@ghost ghost mentioned this pull request Jun 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ROUGE 1 / ROUGE 2

4 participants