Skip to content

Conversation

sean-mccorkle
Copy link
Contributor

…und data

Copy link

@ialarmedalien ialarmedalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you written any tests for the new code? E.g. have a few small files and ensure that the output when running the scripts is as expected, and that the appropriate errors are thrown if the input is dodgy? Imagine that you are a newcomer to the codebase without any sample files -- how will you know that the code does what it's supposed to do?

There are lots of commented-out lines in the script that could be turned into tests. It's much better to be able to automate these checks than having to rely on someone having the appropriate input files and/or the knowledge of what the output should be.

@sean-mccorkle
Copy link
Contributor Author

sean-mccorkle commented Sep 15, 2020

Have you written any tests for the new code? E.g. have a few small files and ensure that the output when running the scripts is as expected, and that the appropriate errors are thrown if the input is dodgy? Imagine that you are a newcomer to the codebase without any sample files -- how will you know that the code does what it's supposed to do?

There are lots of commented-out lines in the script that could be turned into tests. It's much better to be able to automate these checks than having to rely on someone having the appropriate input files and/or the knowledge of what the output should be.

I had not moved along these lines at all because this is only the 2nd time these collections have been uploaded. Between the first and 2nd episodes, there were a number of changes that would have caused the first versions to fail. I decided that the file input formats and even semantics are not stable and not worth the effort to strictly check for until they do firm up. I expect the scripts will continue to be at least partially rewritten every new load.

Its worth having a discussion about this. I've tried to raise the issue a few times. These are not files from GenBank or EMBL, they're from different people around the collaboration who generated them (one of whom has left).

I don't think this process is anywhere near ready for a novice user, not until the all the relevant inputs are "standardized" to some degree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants