Skip to content
10 changes: 10 additions & 0 deletions docs/docs/core-abilities/fetching_ticket_context.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,16 @@ jira_api_token = "YOUR_API_TOKEN"
jira_api_email = "YOUR_EMAIL"
```

To use Jira as the issue provider for ticket compliance (and `/similar_issue`), enable it explicitly:

```toml
[config]
issue_provider = "jira"

[jira]
issue_projects = ["ABC"] # or issue_jql = "project = ABC order by created DESC"
```

### Jira Data Center/Server

[//]: # ()
Expand Down
72 changes: 69 additions & 3 deletions docs/docs/tools/similar_issues.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
## Overview

The similar issue tool retrieves the most similar issues to the current issue.
It can be invoked manually by commenting on any PR:
The similar issue tool retrieves the most similar issues to the current issue or MR context.
It can be invoked manually by commenting on any PR/MR:

```
/similar_issue
Expand All @@ -15,8 +15,69 @@ It can be invoked manually by commenting on any PR:

![similar_issue](https://codium.ai/images/pr_agent/similar_issue.png){width=768}

### GitLab example (MR comment)

Comment on an MR:

```
/similar_issue
```

Example output posted to the MR:

```
### Similar Issues
___

1. **[Add retry logic for HTTP client](https://gitlab.example.com/org/repo/-/issues/1)** (score=0.91)
2. **[Cache embeddings for faster review](https://gitlab.example.com/org/repo/-/issues/3)** (score=0.89)
```

Note that to perform retrieval, the `similar_issue` tool indexes all the repo previous issues (once).

## Indexing lifecycle and scope

### What is indexed
- Issues and (optionally) issue comments only. MRs are not indexed.
- Each vector includes `repo`, `username`, `created_at`, and `level` (issue or comment).

### When indexing happens
- On demand, the first time `/similar_issue` is called for a repo.
- A per-repo marker record is stored to avoid re-indexing the same repo.
- On later runs, only new issues are appended (based on issue IDs).

### Query scope
- One shared collection is used, but queries always filter to the current repo.
- GitLab: the query text comes from MR title + description. If the MR text includes `#<issue>`, that GitLab issue is used as the query source, but the output still posts on the MR.

```mermaid
flowchart TD
A[Comment /similar_issue on MR] --> B{Repo indexed?}
B -- No --> C[Fetch repo issues + comments]
C --> D[Embed + upsert vectors to vector DB]
B -- Yes --> E[Check for new issues]
E --> F{New issues?}
F -- Yes --> D
F -- No --> G[Build query]
D --> G[Build query]
G --> H[Query vector DB (filter by repo)]
H --> I[Post Similar Issues on MR]
```

## Embedding configuration

The tool uses an OpenAI-compatible embeddings endpoint. Configure it in `configuration.toml` (or via env vars):

```
[pr_similar_issue]
embedding_base_url = "https://your-embeddings-host/v1/embeddings"
embedding_model = "intfloat/multilingual-e5-large"
embedding_dim = 1024
embedding_max_tokens = 10000
```

If the embedding endpoint requires auth, set `PR_SIMILAR_ISSUE__EMBEDDING_API_KEY` as an environment variable.

### Selecting a Vector Database

Configure your preferred database by changing the `pr_similar_issue` parameter in `configuration.toml` file.
Expand Down Expand Up @@ -59,13 +120,18 @@ vectordb = "qdrant"
```

You can get a free managed Qdrant instance from [Qdrant Cloud](https://cloud.qdrant.io/).
Ensure the Qdrant collection dimension matches `embedding_dim`. If you change models, set
`pr_similar_issue.force_update_dataset=true` to rebuild the collection.

## How to use

- To invoke the 'similar issue' tool from **CLI**, run:
`python3 cli.py --issue_url=... similar_issue`

- To invoke the 'similar' issue tool via online usage, [comment](https://github.com/Codium-ai/pr-agent/issues/178#issuecomment-1716934893) on a PR:
- To invoke the 'similar issue' tool via online usage, [comment](https://github.com/Codium-ai/pr-agent/issues/178#issuecomment-1716934893) on a PR/MR:
`/similar_issue`

- GitLab: if run from an MR comment, the query uses the MR title + description. If the MR text includes an issue reference (e.g., `#123`), that issue is used as the query source, but the output is still posted on the MR. If run from CLI with `--issue_url`, the query uses that issue.
- Jira: set `issue_provider="jira"` and configure `[jira]` with either `issue_projects` (or `issue_project_map`) or `issue_jql`. When enabled, `/similar_issue` indexes Jira issues instead of GitLab/GitHub issues. If the MR text includes Jira keys (e.g., `ABC-123`), those tickets are used as the query source; otherwise it uses the MR title + description.

- You can also enable the 'similar issue' tool to run automatically when a new issue is opened, by adding it to the [pr_commands list in the github_app section](https://github.com/Codium-ai/pr-agent/blob/main/pr_agent/settings/configuration.toml#L66)
16 changes: 16 additions & 0 deletions pr_agent/algo/ticket_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import re
from typing import List

JIRA_KEY_PATTERN = re.compile(r"(?:https?://[^\s/]+/browse/)?([A-Z][A-Z0-9]+-\d{1,7})", re.IGNORECASE)


def find_jira_keys(text: str) -> List[str]:
if not text:
return []
matches = JIRA_KEY_PATTERN.findall(text)
keys = []
for match in matches:
key = match.upper()
if key not in keys:
keys.append(key)
return keys
111 changes: 92 additions & 19 deletions pr_agent/git_providers/gitlab_provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,8 @@ def __init__(self, merge_request_url: Optional[str] = None, incremental: Optiona
self.temp_comments = []
self._submodule_cache: dict[tuple[str, str, str], list[dict]] = {}
self.pr_url = merge_request_url
self._set_merge_request(merge_request_url)
if merge_request_url and self._is_merge_request_url(merge_request_url):
self._set_merge_request(merge_request_url)
self.RE_HUNK_HEADER = re.compile(
r"^@@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))? @@[ ]?(.*)")
self.incremental = incremental
Expand Down Expand Up @@ -785,9 +786,6 @@ def get_pr_owner_id(self) -> str | None:
def get_pr_description_full(self):
return self.mr.description

def get_issue_comments(self):
return self.mr.notes.list(get_all=True)[::-1]

def get_repo_settings(self):
try:
main_branch = self.gl.projects.get(self.id_project).default_branch
Expand Down Expand Up @@ -847,6 +845,13 @@ def remove_reaction(self, issue_comment_id: int, reaction_id: str) -> bool:
get_logger().warning(f"Failed to remove reaction, error: {e}")
return False

def _is_merge_request_url(self, url: str) -> bool:
try:
path_parts = urlparse(url).path.strip('/').split('/')
except Exception:
return False
return "merge_requests" in path_parts

def _parse_merge_request_url(self, merge_request_url: str) -> Tuple[str, int]:
parsed_url = urlparse(merge_request_url)

Expand All @@ -872,10 +877,64 @@ def _parse_merge_request_url(self, merge_request_url: str) -> Tuple[str, int]:
# Return the path before 'merge_requests' and the ID
return project_path, mr_id

def _parse_issue_url(self, issue_url: str) -> Tuple[str, int]:
parsed_url = urlparse(issue_url)

path_parts = parsed_url.path.strip('/').split('/')
if 'issues' not in path_parts:
raise ValueError("The provided URL does not appear to be a GitLab issue URL")

issues_index = path_parts.index('issues')
if len(path_parts) <= issues_index + 1:
raise ValueError("The provided URL does not contain an issue IID")

try:
issue_iid = int(path_parts[issues_index + 1])
except ValueError as e:
raise ValueError("Unable to convert issue IID to integer") from e

project_parts = path_parts[:issues_index]
if project_parts and project_parts[-1] == '-':
project_parts = project_parts[:-1]
project_path = "/".join(project_parts)
if project_path.endswith('/-'):
project_path = project_path[:-2]
return project_path, issue_iid

def _get_merge_request(self):
mr = self.gl.projects.get(self.id_project).mergerequests.get(self.id_mr)
return mr

def _get_project(self, project_path: str):
try:
encoded = urllib.parse.quote_plus(project_path)
return self.gl.projects.get(encoded)
except Exception:
return self._project_by_path(project_path)

def get_issue(self, issue_iid: int, project_path: Optional[str] = None):
project = self._get_project(project_path or self.id_project)
if project is None:
raise GitlabGetError("Project not found")
return project.issues.get(issue_iid)

def list_issues(self, project_path: Optional[str] = None, state: str = "all"):
project = self._get_project(project_path or self.id_project)
if project is None:
raise GitlabGetError("Project not found")
return project.issues.list(state=state, iterator=True)

def get_issue_comments(self, issue=None):
if issue is None:
try:
return self.mr.notes.list(get_all=True)[::-1]
except Exception:
return []
return list(issue.notes.list(iterator=True))

def create_issue_comment(self, issue, body: str):
return issue.notes.create({"body": body})

def get_user_id(self):
return None

Expand Down Expand Up @@ -954,22 +1013,36 @@ def generate_link_to_relevant_line_number(self, suggestion) -> str:
return ""
#Clone related
def _prepare_clone_url_with_token(self, repo_url_to_clone: str) -> str | None:
if "gitlab." not in repo_url_to_clone:
get_logger().error(f"Repo URL: {repo_url_to_clone} is not a valid gitlab URL.")
return None
(scheme, base_url) = repo_url_to_clone.split("gitlab.")
access_token = getattr(self.gl, 'oauth_token', None) or getattr(self.gl, 'private_token', None)
if not all([scheme, access_token, base_url]):
get_logger().error(f"Either no access token found, or repo URL: {repo_url_to_clone} "
f"is missing prefix: {scheme} and/or base URL: {base_url}.")
if not access_token:
get_logger().error("No access token found for GitLab clone.")
return None

#Note that the ""official"" method found here:
# https://docs.gitlab.com/user/profile/personal_access_tokens/#clone-repository-using-personal-access-token
# requires a username, which may not be applicable.
# The following solution is taken from: https://stackoverflow.com/questions/25409700/using-gitlab-token-to-clone-without-authentication/35003812#35003812
# For example: For repo url: https://gitlab.codium-inc.com/qodo/autoscraper.git
# Then to clone one will issue: 'git clone https://oauth2:<access token>@gitlab.codium-inc.com/qodo/autoscraper.git'
# Note: GitLab instances are not always hosted under a gitlab.* domain.
# Build a clone URL that works with any host (e.g., git.labs.hosting.cerence.net).
if repo_url_to_clone.startswith(("http://", "https://")):
try:
from urllib.parse import urlparse
parsed = urlparse(repo_url_to_clone)
if not parsed.scheme or not parsed.netloc:
raise ValueError("missing scheme or host")
netloc = parsed.netloc.split("@")[-1]
return f"{parsed.scheme}://oauth2:{access_token}@{netloc}{parsed.path}"
except Exception as exc:
get_logger().error(
f"Repo URL: {repo_url_to_clone} could not be parsed for clone.",
artifact={"error": str(exc)},
)
return None

clone_url = f"{scheme}oauth2:{access_token}@gitlab.{base_url}"
return clone_url
# Fallback to legacy gitlab.* parsing when a raw URL is provided.
if "gitlab." not in repo_url_to_clone:
get_logger().error(f"Repo URL: {repo_url_to_clone} is not a valid gitlab URL.")
return None
scheme, base_url = repo_url_to_clone.split("gitlab.")
if not all([scheme, base_url]):
get_logger().error(
f"Repo URL: {repo_url_to_clone} is missing prefix: {scheme} and/or base URL: {base_url}."
)
return None
return f"{scheme}oauth2:{access_token}@gitlab.{base_url}"
16 changes: 16 additions & 0 deletions pr_agent/issue_providers/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
from pr_agent.issue_providers.base import Issue, IssueComment, IssueProvider
from pr_agent.issue_providers.github_issue_provider import GithubIssueProvider
from pr_agent.issue_providers.gitlab_issue_provider import GitlabIssueProvider
from pr_agent.issue_providers.jira_issue_provider import JiraIssueProvider
from pr_agent.issue_providers.resolver import get_issue_provider, resolve_issue_provider_name

__all__ = [
"Issue",
"IssueComment",
"IssueProvider",
"GithubIssueProvider",
"GitlabIssueProvider",
"JiraIssueProvider",
"get_issue_provider",
"resolve_issue_provider_name",
]
46 changes: 46 additions & 0 deletions pr_agent/issue_providers/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
from __future__ import annotations

from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from typing import Iterable, List, Optional


@dataclass
class IssueComment:
body: str
url: str = ""
id: Optional[str] = None
author: Optional[str] = None


@dataclass
class Issue:
key: str
title: str
description: str = ""
url: str = ""
created_at: Optional[str] = None
author: Optional[dict] = None
comments: List[IssueComment] = field(default_factory=list)
labels: List[str] = field(default_factory=list)

@property
def body(self) -> str:
return self.description

@property
def web_url(self) -> str:
return self.url


class IssueProvider(ABC):
@abstractmethod
def list_issues(self, project_path: Optional[str] = None, state: str = "all") -> Iterable:
raise NotImplementedError

@abstractmethod
def get_issue(self, issue_id: str, project_path: Optional[str] = None):
raise NotImplementedError

def get_issue_comments(self, issue) -> List[IssueComment]:
return []
21 changes: 21 additions & 0 deletions pr_agent/issue_providers/github_issue_provider.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
from __future__ import annotations

from typing import Optional

from pr_agent.issue_providers.base import IssueProvider


class GithubIssueProvider(IssueProvider):
def __init__(self, git_provider, repo_obj):
self.git_provider = git_provider
self.repo_obj = repo_obj

def list_issues(self, project_path: Optional[str] = None, state: str = "all"):
return self.repo_obj.get_issues(state=state)

def get_issue(self, issue_id, project_path: Optional[str] = None):
issue_number = int(issue_id)
return self.repo_obj.get_issue(issue_number)

def get_issue_comments(self, issue):
return list(issue.get_comments())
20 changes: 20 additions & 0 deletions pr_agent/issue_providers/gitlab_issue_provider.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
from __future__ import annotations

from typing import Optional

from pr_agent.issue_providers.base import IssueProvider


class GitlabIssueProvider(IssueProvider):
def __init__(self, git_provider):
self.git_provider = git_provider

def list_issues(self, project_path: Optional[str] = None, state: str = "all"):
return self.git_provider.list_issues(project_path, state=state)

def get_issue(self, issue_id, project_path: Optional[str] = None):
issue_iid = int(issue_id)
return self.git_provider.get_issue(issue_iid, project_path)

def get_issue_comments(self, issue):
return self.git_provider.get_issue_comments(issue)
Loading