feat: add event dataclasses #32

cademirch · 2025-07-24T18:47:18Z

This adds logevent dataclasses for conviently parsing log records into a dataclass.

Summary by CodeRabbit

New Features
- Introduced structured event dataclasses for various workflow and logging events, enabling typed access to event data.
Documentation
- Updated documentation to describe the new dataclass-based event handling approach, including usage examples and a mapping table for events to dataclasses.

coderabbitai · 2025-07-24T18:47:24Z

📝 Walkthrough

Walkthrough

The changes introduce a new module containing dataclasses for structured log event data extraction and update the README to document a typed approach for accessing log event fields. The documentation now references these dataclasses, provides usage examples, and removes informal field lists, shifting to a structured, type-safe event handling method.

Changes

File(s)	Change Summary
README.md	Rewrote log event documentation to reference new dataclasses, added usage examples, and removed informal field lists.
src/snakemake_interface_logger_plugins/events.py	Added multiple dataclasses for structured log event data extraction, each with a `from_record` method.

Sequence Diagram(s)

sequenceDiagram
    participant LogHandler
    participant LogRecord
    participant EventsDataclass

    LogHandler->>LogRecord: Receives log record (with event)
    LogHandler->>EventsDataclass: EventsDataclass.from_record(LogRecord)
    EventsDataclass-->>LogHandler: Returns typed event instance
    LogHandler->>LogHandler: Access event fields via dataclass

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15–20 minutes

Possibly related PRs

feat: add rulegraph property and event, add error event #16: Adds RULEGRAPH and ERROR log events and corresponding dataclasses, directly related to the new dataclasses and documentation update in this PR.
feat: add logevent to common #14: Introduces the LogEvent enum, which is now referenced in the updated documentation and used to map to dataclasses for event handling.

Suggested reviewers

johanneskoester

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat-event-dataclasses

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

src/snakemake_interface_logger_plugins/events.py (2)
28-47: Review the post_init validation logic for edge cases.

The WorkflowStarted class has good type conversion handling, but there are some considerations:

The post_init method modifies self.snakefile in place, which could be unexpected behavior in an immutable dataclass

The None check should come before the string conversion attempt

Consider this improvement:
def __post_init__(self) -> None:
-   if not isinstance(self.snakefile, str):
+   if self.snakefile is not None and not isinstance(self.snakefile, str):
        try:
            # Try to convert to string - this should work for PosixPath and other path-like objects
            self.snakefile = str(self.snakefile)
        except (TypeError, ValueError) as e:
            raise ValueError(f"Could not convert snakefile to string: {e}")
170-186: Review the mutually exclusive field logic.

The ResourcesInfo class uses conditional logic to populate only one field at a time, but this design might not be intuitive. The class could potentially have multiple resource-related fields populated simultaneously.

Consider allowing multiple fields to be populated or document the mutually exclusive behavior:
@classmethod
def from_record(cls, record: LogRecord) -> "ResourcesInfo":
-   if hasattr(record, "nodes"):
-       return cls(nodes=record.nodes)  # type: ignore
-   elif hasattr(record, "cores"):
-       return cls(cores=record.cores)  # type: ignore
-   elif hasattr(record, "provided_resources"):
-       return cls(provided_resources=record.provided_resources)  # type: ignore
-   else:
-       return cls()
+   return cls(
+       nodes=getattr(record, "nodes", None),
+       cores=getattr(record, "cores", None),
+       provided_resources=getattr(record, "provided_resources", None),
+   )

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42c55ff and 1cce5ea.

📒 Files selected for processing (2)

README.md (1 hunks)
src/snakemake_interface_logger_plugins/events.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

⚙️ CodeRabbit Configuration File

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

src/snakemake_interface_logger_plugins/events.py

🔇 Additional comments (12)

src/snakemake_interface_logger_plugins/events.py (11)

1-5: LGTM! Clean imports and type annotations.

The imports are well-organized and include all necessary types for the dataclasses.

7-26: LGTM! Error dataclass implementation is solid.

The Error dataclass properly handles optional fields and uses safe attribute access with getattr() and default values.

49-90: LGTM! JobInfo class handles complex resource extraction well.

The resource filtering logic correctly excludes internal fields (_cores, _nodes) and safely handles cases where the resources attribute might not exist or have the expected structure.

92-106: LGTM! Robust job ID normalization logic.

The JobStarted class properly handles different input types (None, int, list) and normalizes them to a consistent List[int] format.

132-141: LGTM! Simple and consistent implementation.

The JobError class follows the same pattern as other simple dataclasses with appropriate default values.

143-153: LGTM! Proper handling of group information.

The GroupInfo class correctly uses default_factory for mutable default values and provides safe defaults.

155-168: LGTM! Consistent with GroupInfo implementation.

The GroupError class follows good practices with default_factory for mutable fields and safe attribute access.

188-203: LGTM! Consistent implementation pattern.

The DebugDag class follows the established pattern with safe attribute access and appropriate optional types.

205-213: LGTM! Simple and effective implementation.

The Progress class correctly handles required integer fields with sensible defaults.

215-222: LGTM! Appropriate handling of complex data.

The RuleGraph class uses Dict[str, Any] for the flexible rulegraph structure and provides a safe default.

224-239: LGTM! Smart statistics parsing logic.

The RunInfo class correctly separates the "total" field from per-rule counts and handles missing data gracefully.

README.md (1)

234-277: LGTM! Excellent example demonstrating proper usage.

The example function parse_event() demonstrates best practices:

Proper event type checking with hasattr()

Complete mapping of all LogEvent types to dataclasses

Type-safe usage with isinstance() checks

Clear documentation of the pattern

This is exactly the kind of example that will help plugin developers use the new dataclasses correctly.

coderabbitai · 2025-07-24T18:52:48Z

README.md

+The `LogEvent` enum defines particularly important Snakemake events such as workflow starting, job submission, job failure, etc.  
+For each event, a corresponding dataclass is defined in `snakemake_interface_logger_plugins.events`.  
+These dataclasses provide a typed interface for accessing event-specific fields from a `LogRecord`.
+
+**To extract event data from a `LogRecord`, use the appropriate dataclass's `from_record()` method.**
+
+### Event Types, Dataclasses, and Typical Fields
+
+| LogEvent                    | Dataclass         | Typical Fields (see class for details)                                                                               |
+| --------------------------- | ----------------- | -------------------------------------------------------------------------------------------------------------------- |
+| `LogEvent.ERROR`            | `Error`           | exception, location, rule, traceback, file, line                                                                     |
+| `LogEvent.WORKFLOW_STARTED` | `WorkflowStarted` | workflow_id, snakefile                                                                                               |
+| `LogEvent.JOB_INFO`         | `JobInfo`         | jobid, rule_name, threads, input, output, log, benchmark, rule_msg, wildcards, reason, shellcmd, priority, resources |
+| `LogEvent.JOB_STARTED`      | `JobStarted`      | job_ids                                                                                                              |
+| `LogEvent.JOB_FINISHED`     | `JobFinished`     | job_id                                                                                                               |
+| `LogEvent.SHELLCMD`         | `ShellCmd`        | jobid, shellcmd, rule_name                                                                                           |
+| `LogEvent.JOB_ERROR`        | `JobError`        | jobid                                                                                                                |
+| `LogEvent.GROUP_INFO`       | `GroupInfo`       | group_id, jobs                                                                                                       |
+| `LogEvent.GROUP_ERROR`      | `GroupError`      | groupid, aux_logs, job_error_info                                                                                    |
+| `LogEvent.RESOURCES_INFO`   | `ResourcesInfo`   | nodes, cores, provided_resources                                                                                     |
+| `LogEvent.DEBUG_DAG`        | `DebugDag`        | status, job, file, exception                                                                                         |
+| `LogEvent.PROGRESS`         | `Progress`        | done, total                                                                                                          |
+| `LogEvent.RULEGRAPH`        | `RuleGraph`       | rulegraph                                                                                                            |
+| `LogEvent.RUN_INFO`         | `RunInfo`         | per_rule_job_counts, total_job_count                                                                                 |
+
+**Note:** These field lists are guidelines only and may change between versions.  
+Always use defensive programming practices like `getattr()` with defaults or `hasattr()` checks when accessing fields.
+


💡 Verification agent

🧩 Analysis chain

Verify event mapping table accuracy against actual dataclasses.

The documentation provides a comprehensive mapping table, but we should verify that all mappings are accurate and that field lists match the actual dataclass definitions.

🏁 Script executed:

#!/bin/bash # Verify that all LogEvent enum values have corresponding dataclasses echo "=== Checking LogEvent enum values ===" rg -A 20 "class LogEvent" src/ echo "=== Checking dataclass names in events.py ===" rg "^class [A-Z]" src/snakemake_interface_logger_plugins/events.py echo "=== Verifying field names in dataclasses match documentation ===" # Check specific examples mentioned in the table rg -A 10 "class Error:" src/snakemake_interface_logger_plugins/events.py rg -A 10 "class JobInfo:" src/snakemake_interface_logger_plugins/events.py rg -A 10 "class ResourcesInfo:" src/snakemake_interface_logger_plugins/events.py

Length of output: 3084

🏁 Script executed:

#!/bin/bash echo "=== WorkflowStarted fields ===" rg -A10 "class WorkflowStarted" src/snakemake_interface_logger_plugins/events.py echo "=== GroupInfo fields ===" rg -A10 "class GroupInfo" src/snakemake_interface_logger_plugins/events.py echo "=== GroupError fields ===" rg -A10 "class GroupError" src/snakemake_interface_logger_plugins/events.py echo "=== RunInfo fields ===" rg -A10 "class RunInfo" src/snakemake_interface_logger_plugins/events.py

Length of output: 2107

#!/bin/bash for cls in Error WorkflowStarted JobInfo JobStarted JobFinished ShellCmd JobError GroupInfo GroupError ResourcesInfo DebugDag Progress RuleGraph RunInfo; do echo "=== $cls fields ===" rg -A20 "class $cls" src/snakemake_interface_logger_plugins/events.py || echo "No class $cls found" echo done

Update README event table for JobInfo fields

All LogEvent enum values map correctly to dataclasses in events.py, and the majority of field lists match. However, the JobInfo entry in the table includes three attributes that the JobInfo dataclass does not define:

The class defines:
• jobid, rule_name, threads, input, output, log, benchmark, rule_msg, wildcards, reason

The table lists (incorrectly):
• shellcmd, priority, resources

Please remove or replace shellcmd, priority, and resources from the LogEvent.JOB_INFO row and ensure the table reflects the actual JobInfo fields (e.g., rule_msg, wildcards, reason).

🤖 Prompt for AI Agents

In README.md between lines 204 and 231, the table listing typical fields for LogEvent.JOB_INFO incorrectly includes shellcmd, priority, and resources, which are not defined in the JobInfo dataclass. Remove these three fields from the JOB_INFO row and update the list to accurately reflect the actual JobInfo dataclass fields such as rule_msg, wildcards, and reason to ensure the documentation matches the code.

coderabbitai · 2025-07-24T18:52:49Z

src/snakemake_interface_logger_plugins/events.py

+@dataclass
+class JobFinished:
+    job_id: int
+
+    @classmethod
+    def from_record(cls, record: LogRecord) -> "JobFinished":
+        return cls(job_id=getattr(record, "job_id"))
+


⚠️ Potential issue

Missing default value for required field.

The JobFinished class accesses job_id without a default value, which could raise an AttributeError if the field is missing from the LogRecord.

Apply this fix to provide a safe default:

def from_record(cls, record: LogRecord) -> "JobFinished": - return cls(job_id=getattr(record, "job_id")) + return cls(job_id=getattr(record, "job_id", 0))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@dataclass

class JobFinished:

job_id: int

@classmethod

def from_record(cls, record: LogRecord) -> "JobFinished":

return cls(job_id=getattr(record, "job_id"))

@dataclass

class JobFinished:

job_id: int

@classmethod

def from_record(cls, record: LogRecord) -> "JobFinished":

return cls(job_id=getattr(record, "job_id", 0))

🤖 Prompt for AI Agents

In src/snakemake_interface_logger_plugins/events.py around lines 108 to 115, the JobFinished dataclass requires a job_id field but does not provide a default value, which can cause an AttributeError if job_id is missing in the LogRecord. Modify the from_record method to safely access job_id using getattr with a default value (e.g., None or a sentinel value) to avoid exceptions when the attribute is absent.

coderabbitai · 2025-07-24T18:52:49Z

src/snakemake_interface_logger_plugins/events.py

+@dataclass
+class ShellCmd:
+    jobid: int
+    shellcmd: Optional[str] = None
+    rule_name: Optional[str] = None
+
+    @classmethod
+    def from_record(cls, record: LogRecord) -> "ShellCmd":
+        return cls(
+            jobid=getattr(record, "jobid", 0),
+            shellcmd=getattr(record, "shellcmd", ""),
+            rule_name=getattr(record, "name", None),
+        )
+


💡 Verification agent

🧩 Analysis chain

Inconsistent attribute name usage.

The ShellCmd class uses record.name for rule_name, but other classes use record.rule_name. This inconsistency could indicate a bug or different attribute naming in different log contexts.

Verify that record.name is the correct attribute for rule name in shell command events:

🏁 Script executed:

#!/bin/bash # Search for ShellCmd usage patterns and rule_name attribute access rg -A 5 -B 5 "shellcmd.*rule_name|rule_name.*shellcmd" # Also search for 'name' attribute usage in logging contexts rg -A 3 "getattr.*name.*None"

Length of output: 2713

Update ShellCmd to use the correct rule_name attribute

In ShellCmd.from_record, you’re currently pulling the rule name from record.name (which is the logger’s name) instead of from the Snakemake-injected record.rule_name. According to the README and other event classes, the field should come from record.rule_name.

Please apply this change in:

File: src/snakemake_interface_logger_plugins/events.py

Around line 123 in the from_record method

Proposed diff:

@classmethod def from_record(cls, record: LogRecord) -> "ShellCmd": return cls( jobid=getattr(record, "jobid", 0), shellcmd=getattr(record, "shellcmd", ""), - rule_name=getattr(record, "name", None), + rule_name=getattr(record, "rule_name", None), )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@dataclass

class ShellCmd:

jobid: int

shellcmd: Optional[str] = None

rule_name: Optional[str] = None

@classmethod

def from_record(cls, record: LogRecord) -> "ShellCmd":

return cls(

jobid=getattr(record, "jobid", 0),

shellcmd=getattr(record, "shellcmd", ""),

rule_name=getattr(record, "name", None),

)

@dataclass

class ShellCmd:

jobid: int

shellcmd: Optional[str] = None

rule_name: Optional[str] = None

@classmethod

def from_record(cls, record: LogRecord) -> "ShellCmd":

return cls(

jobid=getattr(record, "jobid", 0),

shellcmd=getattr(record, "shellcmd", ""),

rule_name=getattr(record, "rule_name", None),

)

🤖 Prompt for AI Agents

In src/snakemake_interface_logger_plugins/events.py around lines 117 to 130, the from_record method of ShellCmd incorrectly assigns rule_name from record.name instead of record.rule_name. Update the from_record method to retrieve rule_name using getattr(record, "rule_name", None) to correctly reflect the Snakemake-injected attribute.

cademirch added 2 commits July 24, 2025 11:41

feat: add event dataclasses

2be76d9

docs: update docs for event dataclasses

1cce5ea

cademirch requested a review from johanneskoester July 24, 2025 18:47

coderabbitai bot reviewed Jul 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add event dataclasses #32

feat: add event dataclasses #32

Uh oh!

cademirch commented Jul 24, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jul 24, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jul 24, 2025

Uh oh!

coderabbitai bot Jul 24, 2025

Uh oh!

coderabbitai bot Jul 24, 2025

Uh oh!

Uh oh!

feat: add event dataclasses #32

Are you sure you want to change the base?

feat: add event dataclasses #32

Uh oh!

Conversation

cademirch commented Jul 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cademirch commented Jul 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 24, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)