Improve audio-to-text pipeline by enabling flash-attention [$750]

## Overview

We have identified an opportunity to improve the current [`[audio-to-text](https://github.com/livepeer/go-livepeer/pull/3078/)`](https://github.com/livepeer/go-livepeer/pull/3078/) pipeline in Livepeer AI Network by enabling [`[flash-attention](https://arxiv.org/abs/2307.08691/)`](https://arxiv.org/abs/2307.08691/) that will speed up the pipeline significantly allowing for faster and almost realtime operation. We are seeking the community and bounty hunters support to quickly implement this optimisation so it can be available to developers working with Livepeer.

## Problem

Implementing improved `flash_attention` to `audio-to-text` models in Livepeer AI Network.

## Desired Solution

Improvement in speed of the model execution for `audio-to-text`pipeline.

## Bounty Requirements

1. Enable the optimisation on the [[existing pipeline](https://github.com/livepeer/ai-worker/blob/main/runner/app/pipelines/audio_to_text.py/)](https://github.com/livepeer/ai-worker/blob/main/runner/app/pipelines/audio_to_text.py/) by enabling memory efficient flash attention.
2. Ensure that devices that don't yet support the optimisation should safely fallback to working Scaled Dot-Product Attention [[SDPA](https://pytorch.org/docs/main/generated/torch.nn.functional.scaled_dot_product_attention/)](https://pytorch.org/docs/main/generated/torch.nn.functional.scaled_dot_product_attention/) implementation . 
3. Create a separate docker container image similar to [[PR #185](https://github.com/livepeer/ai-worker/pull/185/)](https://github.com/livepeer/ai-worker/pull/185/) to avoid dependencies issues with other pipelines.

## Applicant Requirements

- Proven experience working with deep learning frameworks such as PyTorch, particularly in implementing attention mechanisms and optimising model performance.
- Strong experience with [[Python](https://www.python.org/)](https://www.python.org/).

## Scope Exclusions

- None. All areas related to the issue are within scope.

## Implementation Tips

1. Consult the documentation of the flash-attention from [[pytorch](https://pytorch.org/docs/main/generated/torch.nn.functional.scaled_dot_product_attention/)](https://pytorch.org/docs/main/generated/torch.nn.functional.scaled_dot_product_attention/) to better understand how to enable it in `audio-to-text` pipeline.
2. Validate performance improvements in the Flash Attention-enabled pipeline and ensure proper fallback functionality in unsupported devices.

## Additional Resources

## How to Apply

1. **Express Your Interest:** Fill out [[this form](https://www.notion.so/13f0a34856878045ba5be0218bc28d3f?pvs=21)](https://www.notion.so/13f0a34856878045ba5be0218bc28d3f?pvs=21), making sure to specify the bounty you are interested in
2. **Wait for Review:** Our team will review expressions of interest and select the best candidate.
3. **Get Assigned:** If selected, we'll contact you and assign the bounty to you.
4. **Start Working:** Dive into your task! If you need assistance or guidance, join the discussions in the `#developer-lounge` channel on our [[Discord server](https://discord.gg/livepeer)](https://discord.gg/livepeer).
5. **Submit Your Work:** Create a pull request in the relevant repository and request a review.
6. **Notify Us:** Ping us on Discord when you’re pull request is ready for review.
7. **Receive Your Bounty:** We'll arrange the bounty payment once your pull request is approved.
8. **Gain Recognition:** Your valuable contributions will be showcased in our project's [[changelog](https://livepeer-ai.productlane.com/changelog)](https://livepeer-ai.productlane.com/changelog).

## Contact Information

For questions or clarifications, please contact: [[hans@livepeer.org](mailto:hans@livepeer.org)](mailto:hans@livepeer.org)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve audio-to-text pipeline by enabling flash-attention [$750] #71

Overview

Problem

Desired Solution

Bounty Requirements

Applicant Requirements

Scope Exclusions

Implementation Tips

Additional Resources

How to Apply

Contact Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve audio-to-text pipeline by enabling flash-attention [$750] #71

Description

Overview

Problem

Desired Solution

Bounty Requirements

Applicant Requirements

Scope Exclusions

Implementation Tips

Additional Resources

How to Apply

Contact Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions