Skip to content

Conversation

cademirch
Copy link
Collaborator

@cademirch cademirch commented Jul 24, 2025

This adds logevent dataclasses for conviently parsing log records into a dataclass.

Summary by CodeRabbit

  • New Features
    • Introduced structured event dataclasses for various workflow and logging events, enabling typed access to event data.
  • Documentation
    • Updated documentation to describe the new dataclass-based event handling approach, including usage examples and a mapping table for events to dataclasses.

Copy link

coderabbitai bot commented Jul 24, 2025

📝 Walkthrough

Walkthrough

The changes introduce a new module containing dataclasses for structured log event data extraction and update the README to document a typed approach for accessing log event fields. The documentation now references these dataclasses, provides usage examples, and removes informal field lists, shifting to a structured, type-safe event handling method.

Changes

File(s) Change Summary
README.md Rewrote log event documentation to reference new dataclasses, added usage examples, and removed informal field lists.
src/snakemake_interface_logger_plugins/events.py Added multiple dataclasses for structured log event data extraction, each with a from_record method.

Sequence Diagram(s)

sequenceDiagram
    participant LogHandler
    participant LogRecord
    participant EventsDataclass

    LogHandler->>LogRecord: Receives log record (with event)
    LogHandler->>EventsDataclass: EventsDataclass.from_record(LogRecord)
    EventsDataclass-->>LogHandler: Returns typed event instance
    LogHandler->>LogHandler: Access event fields via dataclass
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15–20 minutes

Possibly related PRs

Suggested reviewers

  • johanneskoester

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat-event-dataclasses

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
src/snakemake_interface_logger_plugins/events.py (2)

28-47: Review the post_init validation logic for edge cases.

The WorkflowStarted class has good type conversion handling, but there are some considerations:

  1. The post_init method modifies self.snakefile in place, which could be unexpected behavior in an immutable dataclass
  2. The None check should come before the string conversion attempt

Consider this improvement:

def __post_init__(self) -> None:
-   if not isinstance(self.snakefile, str):
+   if self.snakefile is not None and not isinstance(self.snakefile, str):
        try:
            # Try to convert to string - this should work for PosixPath and other path-like objects
            self.snakefile = str(self.snakefile)
        except (TypeError, ValueError) as e:
            raise ValueError(f"Could not convert snakefile to string: {e}")

170-186: Review the mutually exclusive field logic.

The ResourcesInfo class uses conditional logic to populate only one field at a time, but this design might not be intuitive. The class could potentially have multiple resource-related fields populated simultaneously.

Consider allowing multiple fields to be populated or document the mutually exclusive behavior:

@classmethod
def from_record(cls, record: LogRecord) -> "ResourcesInfo":
-   if hasattr(record, "nodes"):
-       return cls(nodes=record.nodes)  # type: ignore
-   elif hasattr(record, "cores"):
-       return cls(cores=record.cores)  # type: ignore
-   elif hasattr(record, "provided_resources"):
-       return cls(provided_resources=record.provided_resources)  # type: ignore
-   else:
-       return cls()
+   return cls(
+       nodes=getattr(record, "nodes", None),
+       cores=getattr(record, "cores", None),
+       provided_resources=getattr(record, "provided_resources", None),
+   )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42c55ff and 1cce5ea.

📒 Files selected for processing (2)
  • README.md (1 hunks)
  • src/snakemake_interface_logger_plugins/events.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit Configuration File

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake_interface_logger_plugins/events.py
🔇 Additional comments (12)
src/snakemake_interface_logger_plugins/events.py (11)

1-5: LGTM! Clean imports and type annotations.

The imports are well-organized and include all necessary types for the dataclasses.


7-26: LGTM! Error dataclass implementation is solid.

The Error dataclass properly handles optional fields and uses safe attribute access with getattr() and default values.


49-90: LGTM! JobInfo class handles complex resource extraction well.

The resource filtering logic correctly excludes internal fields (_cores, _nodes) and safely handles cases where the resources attribute might not exist or have the expected structure.


92-106: LGTM! Robust job ID normalization logic.

The JobStarted class properly handles different input types (None, int, list) and normalizes them to a consistent List[int] format.


132-141: LGTM! Simple and consistent implementation.

The JobError class follows the same pattern as other simple dataclasses with appropriate default values.


143-153: LGTM! Proper handling of group information.

The GroupInfo class correctly uses default_factory for mutable default values and provides safe defaults.


155-168: LGTM! Consistent with GroupInfo implementation.

The GroupError class follows good practices with default_factory for mutable fields and safe attribute access.


188-203: LGTM! Consistent implementation pattern.

The DebugDag class follows the established pattern with safe attribute access and appropriate optional types.


205-213: LGTM! Simple and effective implementation.

The Progress class correctly handles required integer fields with sensible defaults.


215-222: LGTM! Appropriate handling of complex data.

The RuleGraph class uses Dict[str, Any] for the flexible rulegraph structure and provides a safe default.


224-239: LGTM! Smart statistics parsing logic.

The RunInfo class correctly separates the "total" field from per-rule counts and handles missing data gracefully.

README.md (1)

234-277: LGTM! Excellent example demonstrating proper usage.

The example function parse_event() demonstrates best practices:

  • Proper event type checking with hasattr()
  • Complete mapping of all LogEvent types to dataclasses
  • Type-safe usage with isinstance() checks
  • Clear documentation of the pattern

This is exactly the kind of example that will help plugin developers use the new dataclasses correctly.

Comment on lines +204 to +231
The `LogEvent` enum defines particularly important Snakemake events such as workflow starting, job submission, job failure, etc.
For each event, a corresponding dataclass is defined in `snakemake_interface_logger_plugins.events`.
These dataclasses provide a typed interface for accessing event-specific fields from a `LogRecord`.

**To extract event data from a `LogRecord`, use the appropriate dataclass's `from_record()` method.**

### Event Types, Dataclasses, and Typical Fields

| LogEvent | Dataclass | Typical Fields (see class for details) |
| --------------------------- | ----------------- | -------------------------------------------------------------------------------------------------------------------- |
| `LogEvent.ERROR` | `Error` | exception, location, rule, traceback, file, line |
| `LogEvent.WORKFLOW_STARTED` | `WorkflowStarted` | workflow_id, snakefile |
| `LogEvent.JOB_INFO` | `JobInfo` | jobid, rule_name, threads, input, output, log, benchmark, rule_msg, wildcards, reason, shellcmd, priority, resources |
| `LogEvent.JOB_STARTED` | `JobStarted` | job_ids |
| `LogEvent.JOB_FINISHED` | `JobFinished` | job_id |
| `LogEvent.SHELLCMD` | `ShellCmd` | jobid, shellcmd, rule_name |
| `LogEvent.JOB_ERROR` | `JobError` | jobid |
| `LogEvent.GROUP_INFO` | `GroupInfo` | group_id, jobs |
| `LogEvent.GROUP_ERROR` | `GroupError` | groupid, aux_logs, job_error_info |
| `LogEvent.RESOURCES_INFO` | `ResourcesInfo` | nodes, cores, provided_resources |
| `LogEvent.DEBUG_DAG` | `DebugDag` | status, job, file, exception |
| `LogEvent.PROGRESS` | `Progress` | done, total |
| `LogEvent.RULEGRAPH` | `RuleGraph` | rulegraph |
| `LogEvent.RUN_INFO` | `RunInfo` | per_rule_job_counts, total_job_count |

**Note:** These field lists are guidelines only and may change between versions.
Always use defensive programming practices like `getattr()` with defaults or `hasattr()` checks when accessing fields.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify event mapping table accuracy against actual dataclasses.

The documentation provides a comprehensive mapping table, but we should verify that all mappings are accurate and that field lists match the actual dataclass definitions.


🏁 Script executed:

#!/bin/bash
# Verify that all LogEvent enum values have corresponding dataclasses
echo "=== Checking LogEvent enum values ==="
rg -A 20 "class LogEvent" src/

echo "=== Checking dataclass names in events.py ==="
rg "^class [A-Z]" src/snakemake_interface_logger_plugins/events.py

echo "=== Verifying field names in dataclasses match documentation ==="
# Check specific examples mentioned in the table
rg -A 10 "class Error:" src/snakemake_interface_logger_plugins/events.py
rg -A 10 "class JobInfo:" src/snakemake_interface_logger_plugins/events.py
rg -A 10 "class ResourcesInfo:" src/snakemake_interface_logger_plugins/events.py

Length of output: 3084


🏁 Script executed:

#!/bin/bash
echo "=== WorkflowStarted fields ==="
rg -A10 "class WorkflowStarted" src/snakemake_interface_logger_plugins/events.py

echo "=== GroupInfo fields ==="
rg -A10 "class GroupInfo" src/snakemake_interface_logger_plugins/events.py

echo "=== GroupError fields ==="
rg -A10 "class GroupError" src/snakemake_interface_logger_plugins/events.py

echo "=== RunInfo fields ==="
rg -A10 "class RunInfo" src/snakemake_interface_logger_plugins/events.py

Length of output: 2107


#!/bin/bash
for cls in Error WorkflowStarted JobInfo JobStarted JobFinished ShellCmd JobError GroupInfo GroupError ResourcesInfo DebugDag Progress RuleGraph RunInfo; do
  echo "=== $cls fields ==="
  rg -A20 "class $cls" src/snakemake_interface_logger_plugins/events.py || echo "No class $cls found"
  echo
done

Update README event table for JobInfo fields

All LogEvent enum values map correctly to dataclasses in events.py, and the majority of field lists match. However, the JobInfo entry in the table includes three attributes that the JobInfo dataclass does not define:

  • The class defines:
    • jobid, rule_name, threads, input, output, log, benchmark, rule_msg, wildcards, reason
  • The table lists (incorrectly):
    • shellcmd, priority, resources

Please remove or replace shellcmd, priority, and resources from the LogEvent.JOB_INFO row and ensure the table reflects the actual JobInfo fields (e.g., rule_msg, wildcards, reason).

🤖 Prompt for AI Agents
In README.md between lines 204 and 231, the table listing typical fields for
LogEvent.JOB_INFO incorrectly includes shellcmd, priority, and resources, which
are not defined in the JobInfo dataclass. Remove these three fields from the
JOB_INFO row and update the list to accurately reflect the actual JobInfo
dataclass fields such as rule_msg, wildcards, and reason to ensure the
documentation matches the code.

Comment on lines +108 to +115
@dataclass
class JobFinished:
job_id: int

@classmethod
def from_record(cls, record: LogRecord) -> "JobFinished":
return cls(job_id=getattr(record, "job_id"))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Missing default value for required field.

The JobFinished class accesses job_id without a default value, which could raise an AttributeError if the field is missing from the LogRecord.

Apply this fix to provide a safe default:

def from_record(cls, record: LogRecord) -> "JobFinished":
-   return cls(job_id=getattr(record, "job_id"))
+   return cls(job_id=getattr(record, "job_id", 0))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@dataclass
class JobFinished:
job_id: int
@classmethod
def from_record(cls, record: LogRecord) -> "JobFinished":
return cls(job_id=getattr(record, "job_id"))
@dataclass
class JobFinished:
job_id: int
@classmethod
def from_record(cls, record: LogRecord) -> "JobFinished":
return cls(job_id=getattr(record, "job_id", 0))
🤖 Prompt for AI Agents
In src/snakemake_interface_logger_plugins/events.py around lines 108 to 115, the
JobFinished dataclass requires a job_id field but does not provide a default
value, which can cause an AttributeError if job_id is missing in the LogRecord.
Modify the from_record method to safely access job_id using getattr with a
default value (e.g., None or a sentinel value) to avoid exceptions when the
attribute is absent.

Comment on lines +117 to +130
@dataclass
class ShellCmd:
jobid: int
shellcmd: Optional[str] = None
rule_name: Optional[str] = None

@classmethod
def from_record(cls, record: LogRecord) -> "ShellCmd":
return cls(
jobid=getattr(record, "jobid", 0),
shellcmd=getattr(record, "shellcmd", ""),
rule_name=getattr(record, "name", None),
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Inconsistent attribute name usage.

The ShellCmd class uses record.name for rule_name, but other classes use record.rule_name. This inconsistency could indicate a bug or different attribute naming in different log contexts.

Verify that record.name is the correct attribute for rule name in shell command events:


🏁 Script executed:

#!/bin/bash
# Search for ShellCmd usage patterns and rule_name attribute access
rg -A 5 -B 5 "shellcmd.*rule_name|rule_name.*shellcmd"

# Also search for 'name' attribute usage in logging contexts
rg -A 3 "getattr.*name.*None"

Length of output: 2713


Update ShellCmd to use the correct rule_name attribute

In ShellCmd.from_record, you’re currently pulling the rule name from record.name (which is the logger’s name) instead of from the Snakemake-injected record.rule_name. According to the README and other event classes, the field should come from record.rule_name.

Please apply this change in:

  • File: src/snakemake_interface_logger_plugins/events.py
  • Around line 123 in the from_record method

Proposed diff:

     @classmethod
     def from_record(cls, record: LogRecord) -> "ShellCmd":
         return cls(
             jobid=getattr(record, "jobid", 0),
             shellcmd=getattr(record, "shellcmd", ""),
-            rule_name=getattr(record, "name", None),
+            rule_name=getattr(record, "rule_name", None),
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@dataclass
class ShellCmd:
jobid: int
shellcmd: Optional[str] = None
rule_name: Optional[str] = None
@classmethod
def from_record(cls, record: LogRecord) -> "ShellCmd":
return cls(
jobid=getattr(record, "jobid", 0),
shellcmd=getattr(record, "shellcmd", ""),
rule_name=getattr(record, "name", None),
)
@dataclass
class ShellCmd:
jobid: int
shellcmd: Optional[str] = None
rule_name: Optional[str] = None
@classmethod
def from_record(cls, record: LogRecord) -> "ShellCmd":
return cls(
jobid=getattr(record, "jobid", 0),
shellcmd=getattr(record, "shellcmd", ""),
rule_name=getattr(record, "rule_name", None),
)
🤖 Prompt for AI Agents
In src/snakemake_interface_logger_plugins/events.py around lines 117 to 130, the
from_record method of ShellCmd incorrectly assigns rule_name from record.name
instead of record.rule_name. Update the from_record method to retrieve rule_name
using getattr(record, "rule_name", None) to correctly reflect the
Snakemake-injected attribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant