-
Notifications
You must be signed in to change notification settings - Fork 2
feat: add event dataclasses #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThe changes introduce a new module containing dataclasses for structured log event data extraction and update the README to document a typed approach for accessing log event fields. The documentation now references these dataclasses, provides usage examples, and removes informal field lists, shifting to a structured, type-safe event handling method. Changes
Sequence Diagram(s)sequenceDiagram
participant LogHandler
participant LogRecord
participant EventsDataclass
LogHandler->>LogRecord: Receives log record (with event)
LogHandler->>EventsDataclass: EventsDataclass.from_record(LogRecord)
EventsDataclass-->>LogHandler: Returns typed event instance
LogHandler->>LogHandler: Access event fields via dataclass
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~15–20 minutes Possibly related PRs
Suggested reviewers
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
src/snakemake_interface_logger_plugins/events.py (2)
28-47
: Review the post_init validation logic for edge cases.The WorkflowStarted class has good type conversion handling, but there are some considerations:
- The post_init method modifies
self.snakefile
in place, which could be unexpected behavior in an immutable dataclass- The None check should come before the string conversion attempt
Consider this improvement:
def __post_init__(self) -> None: - if not isinstance(self.snakefile, str): + if self.snakefile is not None and not isinstance(self.snakefile, str): try: # Try to convert to string - this should work for PosixPath and other path-like objects self.snakefile = str(self.snakefile) except (TypeError, ValueError) as e: raise ValueError(f"Could not convert snakefile to string: {e}")
170-186
: Review the mutually exclusive field logic.The ResourcesInfo class uses conditional logic to populate only one field at a time, but this design might not be intuitive. The class could potentially have multiple resource-related fields populated simultaneously.
Consider allowing multiple fields to be populated or document the mutually exclusive behavior:
@classmethod def from_record(cls, record: LogRecord) -> "ResourcesInfo": - if hasattr(record, "nodes"): - return cls(nodes=record.nodes) # type: ignore - elif hasattr(record, "cores"): - return cls(cores=record.cores) # type: ignore - elif hasattr(record, "provided_resources"): - return cls(provided_resources=record.provided_resources) # type: ignore - else: - return cls() + return cls( + nodes=getattr(record, "nodes", None), + cores=getattr(record, "cores", None), + provided_resources=getattr(record, "provided_resources", None), + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
README.md
(1 hunks)src/snakemake_interface_logger_plugins/events.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit Configuration File
**/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theself
argument of methods.
Do not suggest type annotation of thecls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturn
statement.
Files:
src/snakemake_interface_logger_plugins/events.py
🔇 Additional comments (12)
src/snakemake_interface_logger_plugins/events.py (11)
1-5
: LGTM! Clean imports and type annotations.The imports are well-organized and include all necessary types for the dataclasses.
7-26
: LGTM! Error dataclass implementation is solid.The Error dataclass properly handles optional fields and uses safe attribute access with getattr() and default values.
49-90
: LGTM! JobInfo class handles complex resource extraction well.The resource filtering logic correctly excludes internal fields (
_cores
,_nodes
) and safely handles cases where the resources attribute might not exist or have the expected structure.
92-106
: LGTM! Robust job ID normalization logic.The JobStarted class properly handles different input types (None, int, list) and normalizes them to a consistent List[int] format.
132-141
: LGTM! Simple and consistent implementation.The JobError class follows the same pattern as other simple dataclasses with appropriate default values.
143-153
: LGTM! Proper handling of group information.The GroupInfo class correctly uses default_factory for mutable default values and provides safe defaults.
155-168
: LGTM! Consistent with GroupInfo implementation.The GroupError class follows good practices with default_factory for mutable fields and safe attribute access.
188-203
: LGTM! Consistent implementation pattern.The DebugDag class follows the established pattern with safe attribute access and appropriate optional types.
205-213
: LGTM! Simple and effective implementation.The Progress class correctly handles required integer fields with sensible defaults.
215-222
: LGTM! Appropriate handling of complex data.The RuleGraph class uses Dict[str, Any] for the flexible rulegraph structure and provides a safe default.
224-239
: LGTM! Smart statistics parsing logic.The RunInfo class correctly separates the "total" field from per-rule counts and handles missing data gracefully.
README.md (1)
234-277
: LGTM! Excellent example demonstrating proper usage.The example function
parse_event()
demonstrates best practices:
- Proper event type checking with
hasattr()
- Complete mapping of all LogEvent types to dataclasses
- Type-safe usage with
isinstance()
checks- Clear documentation of the pattern
This is exactly the kind of example that will help plugin developers use the new dataclasses correctly.
The `LogEvent` enum defines particularly important Snakemake events such as workflow starting, job submission, job failure, etc. | ||
For each event, a corresponding dataclass is defined in `snakemake_interface_logger_plugins.events`. | ||
These dataclasses provide a typed interface for accessing event-specific fields from a `LogRecord`. | ||
|
||
**To extract event data from a `LogRecord`, use the appropriate dataclass's `from_record()` method.** | ||
|
||
### Event Types, Dataclasses, and Typical Fields | ||
|
||
| LogEvent | Dataclass | Typical Fields (see class for details) | | ||
| --------------------------- | ----------------- | -------------------------------------------------------------------------------------------------------------------- | | ||
| `LogEvent.ERROR` | `Error` | exception, location, rule, traceback, file, line | | ||
| `LogEvent.WORKFLOW_STARTED` | `WorkflowStarted` | workflow_id, snakefile | | ||
| `LogEvent.JOB_INFO` | `JobInfo` | jobid, rule_name, threads, input, output, log, benchmark, rule_msg, wildcards, reason, shellcmd, priority, resources | | ||
| `LogEvent.JOB_STARTED` | `JobStarted` | job_ids | | ||
| `LogEvent.JOB_FINISHED` | `JobFinished` | job_id | | ||
| `LogEvent.SHELLCMD` | `ShellCmd` | jobid, shellcmd, rule_name | | ||
| `LogEvent.JOB_ERROR` | `JobError` | jobid | | ||
| `LogEvent.GROUP_INFO` | `GroupInfo` | group_id, jobs | | ||
| `LogEvent.GROUP_ERROR` | `GroupError` | groupid, aux_logs, job_error_info | | ||
| `LogEvent.RESOURCES_INFO` | `ResourcesInfo` | nodes, cores, provided_resources | | ||
| `LogEvent.DEBUG_DAG` | `DebugDag` | status, job, file, exception | | ||
| `LogEvent.PROGRESS` | `Progress` | done, total | | ||
| `LogEvent.RULEGRAPH` | `RuleGraph` | rulegraph | | ||
| `LogEvent.RUN_INFO` | `RunInfo` | per_rule_job_counts, total_job_count | | ||
|
||
**Note:** These field lists are guidelines only and may change between versions. | ||
Always use defensive programming practices like `getattr()` with defaults or `hasattr()` checks when accessing fields. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify event mapping table accuracy against actual dataclasses.
The documentation provides a comprehensive mapping table, but we should verify that all mappings are accurate and that field lists match the actual dataclass definitions.
🏁 Script executed:
#!/bin/bash
# Verify that all LogEvent enum values have corresponding dataclasses
echo "=== Checking LogEvent enum values ==="
rg -A 20 "class LogEvent" src/
echo "=== Checking dataclass names in events.py ==="
rg "^class [A-Z]" src/snakemake_interface_logger_plugins/events.py
echo "=== Verifying field names in dataclasses match documentation ==="
# Check specific examples mentioned in the table
rg -A 10 "class Error:" src/snakemake_interface_logger_plugins/events.py
rg -A 10 "class JobInfo:" src/snakemake_interface_logger_plugins/events.py
rg -A 10 "class ResourcesInfo:" src/snakemake_interface_logger_plugins/events.py
Length of output: 3084
🏁 Script executed:
#!/bin/bash
echo "=== WorkflowStarted fields ==="
rg -A10 "class WorkflowStarted" src/snakemake_interface_logger_plugins/events.py
echo "=== GroupInfo fields ==="
rg -A10 "class GroupInfo" src/snakemake_interface_logger_plugins/events.py
echo "=== GroupError fields ==="
rg -A10 "class GroupError" src/snakemake_interface_logger_plugins/events.py
echo "=== RunInfo fields ==="
rg -A10 "class RunInfo" src/snakemake_interface_logger_plugins/events.py
Length of output: 2107
#!/bin/bash
for cls in Error WorkflowStarted JobInfo JobStarted JobFinished ShellCmd JobError GroupInfo GroupError ResourcesInfo DebugDag Progress RuleGraph RunInfo; do
echo "=== $cls fields ==="
rg -A20 "class $cls" src/snakemake_interface_logger_plugins/events.py || echo "No class $cls found"
echo
done
Update README event table for JobInfo
fields
All LogEvent
enum values map correctly to dataclasses in events.py
, and the majority of field lists match. However, the JobInfo
entry in the table includes three attributes that the JobInfo
dataclass does not define:
- The class defines:
• jobid, rule_name, threads, input, output, log, benchmark, rule_msg, wildcards, reason - The table lists (incorrectly):
• shellcmd, priority, resources
Please remove or replace shellcmd
, priority
, and resources
from the LogEvent.JOB_INFO
row and ensure the table reflects the actual JobInfo
fields (e.g., rule_msg
, wildcards
, reason
).
🤖 Prompt for AI Agents
In README.md between lines 204 and 231, the table listing typical fields for
LogEvent.JOB_INFO incorrectly includes shellcmd, priority, and resources, which
are not defined in the JobInfo dataclass. Remove these three fields from the
JOB_INFO row and update the list to accurately reflect the actual JobInfo
dataclass fields such as rule_msg, wildcards, and reason to ensure the
documentation matches the code.
@dataclass | ||
class JobFinished: | ||
job_id: int | ||
|
||
@classmethod | ||
def from_record(cls, record: LogRecord) -> "JobFinished": | ||
return cls(job_id=getattr(record, "job_id")) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing default value for required field.
The JobFinished class accesses job_id
without a default value, which could raise an AttributeError if the field is missing from the LogRecord.
Apply this fix to provide a safe default:
def from_record(cls, record: LogRecord) -> "JobFinished":
- return cls(job_id=getattr(record, "job_id"))
+ return cls(job_id=getattr(record, "job_id", 0))
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
@dataclass | |
class JobFinished: | |
job_id: int | |
@classmethod | |
def from_record(cls, record: LogRecord) -> "JobFinished": | |
return cls(job_id=getattr(record, "job_id")) | |
@dataclass | |
class JobFinished: | |
job_id: int | |
@classmethod | |
def from_record(cls, record: LogRecord) -> "JobFinished": | |
return cls(job_id=getattr(record, "job_id", 0)) |
🤖 Prompt for AI Agents
In src/snakemake_interface_logger_plugins/events.py around lines 108 to 115, the
JobFinished dataclass requires a job_id field but does not provide a default
value, which can cause an AttributeError if job_id is missing in the LogRecord.
Modify the from_record method to safely access job_id using getattr with a
default value (e.g., None or a sentinel value) to avoid exceptions when the
attribute is absent.
@dataclass | ||
class ShellCmd: | ||
jobid: int | ||
shellcmd: Optional[str] = None | ||
rule_name: Optional[str] = None | ||
|
||
@classmethod | ||
def from_record(cls, record: LogRecord) -> "ShellCmd": | ||
return cls( | ||
jobid=getattr(record, "jobid", 0), | ||
shellcmd=getattr(record, "shellcmd", ""), | ||
rule_name=getattr(record, "name", None), | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Inconsistent attribute name usage.
The ShellCmd class uses record.name
for rule_name
, but other classes use record.rule_name
. This inconsistency could indicate a bug or different attribute naming in different log contexts.
Verify that record.name
is the correct attribute for rule name in shell command events:
🏁 Script executed:
#!/bin/bash
# Search for ShellCmd usage patterns and rule_name attribute access
rg -A 5 -B 5 "shellcmd.*rule_name|rule_name.*shellcmd"
# Also search for 'name' attribute usage in logging contexts
rg -A 3 "getattr.*name.*None"
Length of output: 2713
Update ShellCmd to use the correct rule_name
attribute
In ShellCmd.from_record
, you’re currently pulling the rule name from record.name
(which is the logger’s name) instead of from the Snakemake-injected record.rule_name
. According to the README and other event classes, the field should come from record.rule_name
.
Please apply this change in:
- File:
src/snakemake_interface_logger_plugins/events.py
- Around line 123 in the
from_record
method
Proposed diff:
@classmethod
def from_record(cls, record: LogRecord) -> "ShellCmd":
return cls(
jobid=getattr(record, "jobid", 0),
shellcmd=getattr(record, "shellcmd", ""),
- rule_name=getattr(record, "name", None),
+ rule_name=getattr(record, "rule_name", None),
)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
@dataclass | |
class ShellCmd: | |
jobid: int | |
shellcmd: Optional[str] = None | |
rule_name: Optional[str] = None | |
@classmethod | |
def from_record(cls, record: LogRecord) -> "ShellCmd": | |
return cls( | |
jobid=getattr(record, "jobid", 0), | |
shellcmd=getattr(record, "shellcmd", ""), | |
rule_name=getattr(record, "name", None), | |
) | |
@dataclass | |
class ShellCmd: | |
jobid: int | |
shellcmd: Optional[str] = None | |
rule_name: Optional[str] = None | |
@classmethod | |
def from_record(cls, record: LogRecord) -> "ShellCmd": | |
return cls( | |
jobid=getattr(record, "jobid", 0), | |
shellcmd=getattr(record, "shellcmd", ""), | |
rule_name=getattr(record, "rule_name", None), | |
) |
🤖 Prompt for AI Agents
In src/snakemake_interface_logger_plugins/events.py around lines 117 to 130, the
from_record method of ShellCmd incorrectly assigns rule_name from record.name
instead of record.rule_name. Update the from_record method to retrieve rule_name
using getattr(record, "rule_name", None) to correctly reflect the
Snakemake-injected attribute.
This adds logevent dataclasses for conviently parsing log records into a dataclass.
Summary by CodeRabbit