-
Notifications
You must be signed in to change notification settings - Fork 112
Transfer API Improvements #98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe pull request introduces enhancements to the player transfer modules. A new attribute, Changes
Sequence Diagram(s)sequenceDiagram
participant T as Test
participant TP as TransfermarktPlayerTransfers
T->>TP: Call get_player_transfers()
TP->>TP: __parse_player_transfer_history()
TP->>TP: __process_fee_and_type(fee_value)
TP->>TP: __clean_html_value(html_value)
TP-->>TP: Return cleaned fee and transfer type
TP-->>T: Return processed transfer data
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
Apologies created this a week ago and have just closed it. Needed to use the main branch in my fork for a few things. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
tests/players/test_players_transfers.py (1)
50-50: Remove unnecessary whitespace from blank line.This is a minor style issue that should be fixed.
- +🧰 Tools
🪛 Ruff (0.8.2)
50-50: Blank line contains whitespace
Remove whitespace from blank line
(W293)
app/services/players/transfers.py (3)
32-75: Comprehensive implementation for cleaning HTML and determining transfer types.The
__clean_html_valuemethod handles various transfer scenarios effectively:
- Properly cleans HTML tags from fee values
- Identifies different transfer types based on text patterns
- Extracts currency values using regular expressions
- Provides appropriate fallbacks
However, there are several style issues and a potential edge case:
- Style: Inconsistent quote usage (single vs double)
- Style: Unnecessary whitespace in blank lines
- Edge case: The method assumes "permanent" as the default transfer type, which might not always be accurate for unrecognized formats
def __clean_html_value(self, value: str) -> Tuple[str, str]: """ Clean HTML tags from a string value and extract just the currency value or preserve special text values. Also determine the transfer type based on the fee text. Args: value (str): The string value that may contain HTML tags. Returns: Tuple[str, str]: A tuple containing (cleaned_fee_value, transfer_type) """ if not value or not isinstance(value, str): return value, "permanent" - soup = BeautifulSoup(value, 'html.parser') + soup = BeautifulSoup(value, "html.parser") text = soup.get_text().strip() if "end of loan" in text.lower(): return "End of loan", "end_of_loan" if "loan transfer" in text.lower() and not any(char.isdigit() for char in text): return "loan transfer", "loan" if "fee" in text.lower(): - currency_match = re.search(r'(€\d+(?:\.\d+)?[km]?)', text) + currency_match = re.search(r"(€\d+(?:\.\d+)?[km]?)", text) if currency_match: if "loan" in text.lower(): return currency_match.group(1), "loan" return currency_match.group(1), "permanent" - number_match = re.search(r'(\d+(?:\.\d+)?[km]?)', text) + number_match = re.search(r"(\d+(?:\.\d+)?[km]?)", text) if number_match: if "loan" in text.lower(): return number_match.group(1), "loan" return number_match.group(1), "permanent" if "loan" in text.lower() and not any(char.isdigit() for char in text): return "€0", "loan" if "free transfer" in text.lower() or text == "-": return "€0", "free_transfer" return text, "permanent"🧰 Tools
🪛 Ruff (0.8.2)
36-36: Blank line contains whitespace
Remove whitespace from blank line
(W293)
39-39: Blank line contains whitespace
Remove whitespace from blank line
(W293)
45-45: Blank line contains whitespace
Remove whitespace from blank line
(W293)
46-46: Single quotes found but double quotes preferred
Replace single quotes with double quotes
(Q000)
48-48: Blank line contains whitespace
Remove whitespace from blank line
(W293)
51-51: Blank line contains whitespace
Remove whitespace from blank line
(W293)
54-54: Blank line contains whitespace
Remove whitespace from blank line
(W293)
56-56: Single quotes found but double quotes preferred
Replace single quotes with double quotes
(Q000)
61-61: Blank line contains whitespace
Remove whitespace from blank line
(W293)
62-62: Single quotes found but double quotes preferred
Replace single quotes with double quotes
(Q000)
67-67: Blank line contains whitespace
Remove whitespace from blank line
(W293)
70-70: Blank line contains whitespace
Remove whitespace from blank line
(W293)
73-73: Blank line contains whitespace
Remove whitespace from blank line
(W293)
107-122: Well-structured helper method for processing fee values and transfer types.The method properly separates concerns by delegating the HTML cleaning and transfer type detection to
__clean_html_valueand then formatting the result appropriately.Fix the trailing comma and style issues:
def __process_fee_and_type(self, fee_value: str) -> dict: """ Process the fee value and determine the transfer type. Args: fee_value (str): The raw fee value from the transfer data. Returns: dict: A dictionary containing the cleaned fee value and transfer type. """ fee, transfer_type = self.__clean_html_value(fee_value) return { "fee": fee, - "transferType": transfer_type + "transferType": transfer_type, }🧰 Tools
🪛 Ruff (0.8.2)
107-107: Blank line contains whitespace
Remove whitespace from blank line
(W293)
111-111: Blank line contains whitespace
Remove whitespace from blank line
(W293)
114-114: Blank line contains whitespace
Remove whitespace from blank line
(W293)
121-121: Trailing comma missing
Add trailing comma
(COM812)
32-75: Consider adding specific handling for other potential transfer types.The current implementation assumes "permanent" as the default transfer type for unrecognized formats. It might be worth logging unrecognized formats to monitor for new patterns that should be explicitly handled.
Consider adding logging for unrecognized formats:
def __clean_html_value(self, value: str) -> Tuple[str, str]: # ...existing code... if "free transfer" in text.lower() or text == "-": return "€0", "free_transfer" + # If we reach here, we couldn't determine the transfer type definitively + if text and not any(keyword in text.lower() for keyword in ["loan", "free", "end of"]): + logging.debug(f"Unrecognized transfer fee format: {text}, defaulting to 'permanent'") + return text, "permanent"This would help identify patterns that might need explicit handling in the future.
🧰 Tools
🪛 Ruff (0.8.2)
36-36: Blank line contains whitespace
Remove whitespace from blank line
(W293)
39-39: Blank line contains whitespace
Remove whitespace from blank line
(W293)
45-45: Blank line contains whitespace
Remove whitespace from blank line
(W293)
46-46: Single quotes found but double quotes preferred
Replace single quotes with double quotes
(Q000)
48-48: Blank line contains whitespace
Remove whitespace from blank line
(W293)
51-51: Blank line contains whitespace
Remove whitespace from blank line
(W293)
54-54: Blank line contains whitespace
Remove whitespace from blank line
(W293)
56-56: Single quotes found but double quotes preferred
Replace single quotes with double quotes
(Q000)
61-61: Blank line contains whitespace
Remove whitespace from blank line
(W293)
62-62: Single quotes found but double quotes preferred
Replace single quotes with double quotes
(Q000)
67-67: Blank line contains whitespace
Remove whitespace from blank line
(W293)
70-70: Blank line contains whitespace
Remove whitespace from blank line
(W293)
73-73: Blank line contains whitespace
Remove whitespace from blank line
(W293)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
app/schemas/players/transfers.py(2 hunks)app/services/players/transfers.py(3 hunks)tests/players/test_players_transfers.py(3 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/players/test_players_transfers.py
50-50: Blank line contains whitespace
Remove whitespace from blank line
(W293)
app/services/players/transfers.py
36-36: Blank line contains whitespace
Remove whitespace from blank line
(W293)
39-39: Blank line contains whitespace
Remove whitespace from blank line
(W293)
45-45: Blank line contains whitespace
Remove whitespace from blank line
(W293)
46-46: Single quotes found but double quotes preferred
Replace single quotes with double quotes
(Q000)
48-48: Blank line contains whitespace
Remove whitespace from blank line
(W293)
51-51: Blank line contains whitespace
Remove whitespace from blank line
(W293)
54-54: Blank line contains whitespace
Remove whitespace from blank line
(W293)
56-56: Single quotes found but double quotes preferred
Replace single quotes with double quotes
(Q000)
61-61: Blank line contains whitespace
Remove whitespace from blank line
(W293)
62-62: Single quotes found but double quotes preferred
Replace single quotes with double quotes
(Q000)
67-67: Blank line contains whitespace
Remove whitespace from blank line
(W293)
70-70: Blank line contains whitespace
Remove whitespace from blank line
(W293)
73-73: Blank line contains whitespace
Remove whitespace from blank line
(W293)
107-107: Blank line contains whitespace
Remove whitespace from blank line
(W293)
111-111: Blank line contains whitespace
Remove whitespace from blank line
(W293)
114-114: Blank line contains whitespace
Remove whitespace from blank line
(W293)
121-121: Trailing comma missing
Add trailing comma
(COM812)
🔇 Additional comments (7)
app/schemas/players/transfers.py (2)
2-2: Added Literal import for type constraints.The addition of the Literal type from typing is appropriate here as it enables strict type checking for the new transfer_type field.
21-21: Good addition of the transfer_type field with clear type constraints.The new transfer_type field with Literal type constraint properly enforces that transfers can only be categorized as one of the four specified types: "permanent", "loan", "end_of_loan", or "free_transfer". This change aligns well with the PR objective of standardizing transfer type representation.
tests/players/test_players_transfers.py (3)
5-5: Added Or import for schema validation.The addition of the Or import from the schema module is appropriate for validating the new transferType field which can have multiple possible values.
39-39: Added validation for transferType in expected schema.The validation constraint ensures that transferType is always one of the specified values, providing stronger type safety for the API responses.
51-52: Good addition of transfer type validation.This assertion properly validates that all transfers have a transferType field, ensuring the API doesn't return incomplete transfer data. This is important for maintaining consistent data quality.
app/services/players/transfers.py (2)
2-4: Added necessary imports for HTML cleaning and type hinting.The new imports are appropriate for the functionality being added:
- BeautifulSoup for parsing HTML content in fee values
- re for regular expression matching of currency values
- Tuple for proper type hinting of the return value in __clean_html_value
103-103: Smart integration of fee processing and transfer type determination.Using dictionary unpacking to integrate the processed fee and transfer type is clean and efficient.
| def __clean_html_value(self, value: str) -> Tuple[str, str]: | ||
| """ | ||
| Clean HTML tags from a string value and extract just the currency value or preserve special text values. | ||
| Also determine the transfer type based on the fee text. | ||
| Args: | ||
| value (str): The string value that may contain HTML tags. | ||
| Returns: | ||
| Tuple[str, str]: A tuple containing (cleaned_fee_value, transfer_type) | ||
| """ | ||
| if not value or not isinstance(value, str): | ||
| return value, "permanent" | ||
|
|
||
| soup = BeautifulSoup(value, 'html.parser') | ||
| text = soup.get_text().strip() | ||
|
|
||
| if "end of loan" in text.lower(): | ||
| return "End of loan", "end_of_loan" | ||
|
|
||
| if "loan transfer" in text.lower() and not any(char.isdigit() for char in text): | ||
| return "loan transfer", "loan" | ||
|
|
||
| if "fee" in text.lower(): | ||
| currency_match = re.search(r'(€\d+(?:\.\d+)?[km]?)', text) | ||
| if currency_match: | ||
| if "loan" in text.lower(): | ||
| return currency_match.group(1), "loan" | ||
| return currency_match.group(1), "permanent" | ||
|
|
||
| number_match = re.search(r'(\d+(?:\.\d+)?[km]?)', text) | ||
| if number_match: | ||
| if "loan" in text.lower(): | ||
| return number_match.group(1), "loan" | ||
| return number_match.group(1), "permanent" | ||
|
|
||
| if "loan" in text.lower() and not any(char.isdigit() for char in text): | ||
| return "€0", "loan" | ||
|
|
||
| if "free transfer" in text.lower() or text == "-": | ||
| return "€0", "free_transfer" | ||
|
|
||
| return text, "permanent" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify regex pattern for currency extraction
The regex pattern used to extract currency values might not catch all formats. Consider expanding it to handle additional currency formats or edge cases.
🏁 Script executed:
#!/bin/bash
# Check for any transfer fee values in the codebase that might not match the current regex patterns
# Pattern 1: €\d+(?:\.\d+)?[km]?
# Pattern 2: \d+(?:\.\d+)?[km]?
# Check for any test data or mock responses containing fee values
rg -A1 -B1 -i "fee.*[€\$£]" --glob "*.json" --glob "*.py"Length of output: 58
Below is a new verification step to more directly test the regex patterns against a few sample currency strings. This should help clarify whether the current patterns work as intended or need expansion for additional edge cases:
Regex Currency Extraction Verification
It appears the current regex implementation extracts standard euro formats (e.g., "€100", "€100.5", "€100k") effectively. However, note that:
- Edge Cases: Formats with commas (e.g., "€100,00" or "€1,000") won’t match the current pattern.
- Currency Flexibility: Only euro (i.e. "€") values are directly handled; if other currencies such as "$" or "£" might appear, additional patterns will be needed.
- Fallback Behavior: The secondary numeric pattern (
\d+(?:\.\d+)?[km]?) will catch values missing the currency symbol but might not address formatting with separators.
If the application context might include such variants, consider expanding or adding alternative regex patterns to handle these additional formats.
Please verify these behaviors against your expected input cases to ensure robustness.
🏁 Script executed:
#!/bin/bash
python3 - << 'EOF'
import re
# Define the regex patterns being used in the codebase
pattern_euro = re.compile(r'(€\d+(?:\.\d+)?[km]?)')
pattern_numeric = re.compile(r'(\d+(?:\.\d+)?[km]?)')
# Sample test inputs covering common and edge cases
test_values = [
"€100", # Standard euro value
"€100.5", # Euro value with decimal
"€100k", # Euro value with suffix k
"€100.5m", # Euro value with suffix m
"$100", # Dollar value (shouldn't match pattern_euro)
"£100", # Pound value (shouldn't match pattern_euro)
"100", # Numeric without currency, for fallback pattern
"100.5", # Numeric with decimal
"100k", # Numeric with suffix
"€100,00", # Euro value with a comma as decimal separator (edge case)
"€1,000", # Euro value with thousand separator (edge case)
"loan transfer fee of €100" # Context containing extra text
]
print("Pattern: (€\\d+(?:\\.\\d+)?[km]?)")
for s in test_values:
m = pattern_euro.search(s)
print(f"Input: {s!r} -> Extracted: {m.group(1) if m else None}")
print("\nPattern: (\\d+(?:\\.\\d+)?[km]?)")
for s in test_values:
m = pattern_numeric.search(s)
print(f"Input: {s!r} -> Extracted: {m.group(1) if m else None}")
EOFLength of output: 1050
Currency Regex Enhancement Required
The current regex pattern (€\d+(?:\.\d+)?[km]?) reliably handles standard euro formats (e.g., "€100", "€100.5", "€100k"), but our tests show it fails for values with commas (e.g., "€100,00" and "€1,000"). This limits correct extraction of amounts using comma delimiters. Consider updating the regex pattern—for example, replacing (?:\.\d+)? with something like (?:[,.]\d+)?—to account for both period and comma as decimal separators. Also, if inputs may include other currencies (e.g., "$" or "£"), further adjustments may be necessary.
- Location:
app/services/players/transfers.py(Lines 32-75) - Action: Update the regex pattern to robustly capture currency values, including comma-separated decimals/thousands.
🧰 Tools
🪛 Ruff (0.8.2)
36-36: Blank line contains whitespace
Remove whitespace from blank line
(W293)
39-39: Blank line contains whitespace
Remove whitespace from blank line
(W293)
45-45: Blank line contains whitespace
Remove whitespace from blank line
(W293)
46-46: Single quotes found but double quotes preferred
Replace single quotes with double quotes
(Q000)
48-48: Blank line contains whitespace
Remove whitespace from blank line
(W293)
51-51: Blank line contains whitespace
Remove whitespace from blank line
(W293)
54-54: Blank line contains whitespace
Remove whitespace from blank line
(W293)
56-56: Single quotes found but double quotes preferred
Replace single quotes with double quotes
(Q000)
61-61: Blank line contains whitespace
Remove whitespace from blank line
(W293)
62-62: Single quotes found but double quotes preferred
Replace single quotes with double quotes
(Q000)
67-67: Blank line contains whitespace
Remove whitespace from blank line
(W293)
70-70: Blank line contains whitespace
Remove whitespace from blank line
(W293)
73-73: Blank line contains whitespace
Remove whitespace from blank line
(W293)
First of all, thank you for the hard work on this project, Transfermarkt is an incredible source of up to date data and I'm intending on self hosting this and using it for a wider project soon.
During my testing I noticed the transfers endpoint was a bit broken, returning 500s when it came to different types of loans and loan fees etc. I believe there was some HTML tags getting parsed into the responses when only an integer was expected.
There was also no way to differentiate between the different types of transfer, I feel like this is quite an important distinction and it would be really useful for me going forward. I have implemented some changes to the transfer response to make different transfer types obvious.
Please let me know if theres any issues or corrections you would like me to do. Thanks again.
Initial Issues
500 Error with HTML in Fee Values: The API was failing with a 500 error when processing transfers with special fee values like "Loan fee" or "End of loan" because it was receiving HTML-formatted strings that couldn't be converted to numbers.
Missing Transfer Information: Important transfer context like loan returns and loan transfers was being lost due to validation errors, as the schema expected numeric values for fees.
Inconsistent Data Representation: There was no standardised way to represent different types of transfers (permanent, loans, end of loans, free transfers).
Steps Taken to Fix the Issues
HTML Cleaning:
Added BeautifulSoup to parse and clean HTML tags from fee values
Extracted the meaningful text content from the HTML-formatted strings
Loan handling:
Added logic to identify and preserve special text values like "End of loan" and "loan transfer"
Created a system to properly handle these values without causing validation errors
Fee Value Parsing:
Implemented proper parsing of currency values with different formats (€, k, m suffixes)
Converted string fee values to integers for consistent representation
Made the fee field truly optional, only including it when a value is present
Additional Fields Added
Transfer Type Field:
Added a new transferType field to explicitly categorize transfers
Implemented four standard transfer types:
permanent: Regular transfers with fees
loan: Temporary transfers, with or without fees
end_of_loan: Returns to parent clubs after loan periods
free_transfer: Transfers without fees, including youth promotions
Schema Updates:
Updated the schema to include the new transferType field with appropriate validation
Ensured the schema properly handles optional fee values
Test Improvements:
Updated tests to verify that all transfers have appropriate transfer types
These improvements ensure the API now provides comprehensive and accurate transfer information for all players, regardless of their transfer history complexity, while maintaining a consistent and well-structured response format.
Summary by CodeRabbit