Releases: scribe-org/Scribe-Data
Releases Β· scribe-org/Scribe-Data
Scribe-Data 5.1.4
π Bug Fixes
- Allow the convert parser to accept multiple data types (#634).
Scribe-Data 5.1.3
π Bug Fixes
- Fixed data conversion not handling multiple explicitly passed languages and data types (#632).
Scribe-Data 5.1.2
π Bug Fixes
- Fixed data conversion not handling multiple explicitly passed languages (#630).
Scribe-Data 5.1.1
π Bug Fixes
- The path to the contracts was fixed in data filtration to assure that it's a
pathlib.Pathvalue (#627).
β Tests
- The upgrade functionality of the CLI is now comprehensively tested (#624).
β»οΈ Code Refactoring
- The upgrade message instructs the user to use the built in upgrade functionality.
Scribe-Data 5.1.0
β¨ Features
- The upgrade command now upgrades the package via pip rather than bringing down GitHub files and installing them directly.
Scribe-Data 5.0.1
β»οΈ Code Refactoring
- The requirement files have been updated to fix package install errors (#621).
β¬οΈ Dependencies
- Update minimum Python version to 3.11.
Scribe-Data 5.0.1
β»οΈ Code Refactoring
- The requirement files have been updated to fix package install errors (#621).
Scribe-Data 5.0.0
β¨ Features
- Scribe-Data now has the ability to download the most recent or a specific Wikidata lexemes dump (#517).
- Wikidata SPARQL queries are now autogenerated and maintained via Wikidata dumps (#513).
- Forms are separated into files based on their identifiers while ignoring maintainer set queries (#575).
- Queries have been expanded for all languages and data forms based on the Wikidata dump process.
- The date of last modification for Wikidata lexemes has been added to query and dump parsing outputs (#562).
- Interactive mode now functions throughout the CLI functionality where the user is presented with options for data extraction.
- The is now a top level interactive mode command for accessing all Scribe-Data functionality (#523).
- Repeat forms are combined with vertical bars (
"|") as a separator (#544, #573). - A workflow has been created to update the emoji data on a regular basis (#542).
- Resulting data can be filtered based on data contracts (#581).
- Contracts can be checked against data to assure that they're valid given the data's field names (#561).
- The Wikipedia based autosuggestion functionality is now CLI based instead of using a Jupyter notebook (#206).
βοΈ Legal
- SPDX license identifiers have been added for all files (#553).
π Bug Fixes
- The version command was fixed to account for cases where the version has a
vbefore it (#534). - The functionality to check for current data and prompt its deletion was centralized and messages to the user were made more clear (#336).
- If Wikidata queries can't be completed, Scribe-Data now includes dramatically better error messages and directs the user to leverage commands that use Wikidata dumps (#549).
- General bug fixes for a more fluid developer experience.
β Tests
- Tests have been written for all new functionalities (#570).
- CI testing now includes a coverage check that breaks if coverage falls below a given percentage.
π Documentation
- Documentation has been expanded for all functionalities of the CLI.
β»οΈ Code Refactoring
- All numpydoc docstrings have been fixed and unneeded code has been removed (#547).
Scribe-Data 4.1.0
β¨ Features
- Queries for noun genders and other properties that require the Wikidata label service now return their English label rather than auto label that was returning just the Wikidata QID.
- SPARQL queries for English and Portuguese prepositions were added to allow the CLI to query these types of data.
- The convert functionality once again works for lists of languages all data types for them.
π Bug Fixes
- SQLite conversion was fixed for all queries (#527).
- The data conversion process outputs were improved including capitalizing language names and repeat notices to the user were removed.
- The CLI's
getcommand now returns all data types if none is passed. - The Portuguese verbs query was fixed as it wasn't formatted correctly.
- The emoji keyword functionality was fixed given the new lexeme ID based form of the data.
- Arguments were fixed that were breaking the functionality.
- Languages for the user were capitalized.
casehas been renamedgrammaticalCasein preposition queries to assure that SQLite reserved keywords are not used.
Scribe-Data 4.0.0
β¨ Features
- Queries for countless data types for countless languages were expanded and added β€οΈ
- Scribe-Data is now a fully functional CLI.
- Querying Wikidata lexicographical data can be done via the
getcommand (#159). - The output type of queries can be in JSON, CSV, TSV and SQLite, with converting output types also being possible (#145, #146)
- Output paths can be set for query results (#144).
- The version of the CLI can be printed to the command line and the CLI can further be used to upgrade itself (#186, #157 ).
- Total Wikidata lexemes for languages and data types can be derived with the
totalcommand (#147). - Interactive and total commands can be used via an interactive mode with the
--interactiveargument (#158, #203). - Outputs were standardized to assure that the CLI experience is consistent
- Querying Wikidata lexicographical data can be done via the
- The machine translation process has been removed to make way for the Wiktionary based implementation (#292).
- Package metadata files were standardized for languages, data types and Wikidata lexeme forms.
- CLI commands have an argument check that can suggest correct languages and data types (#341).
π Bug Fixes
- Wikidata query process stages no longer trigger the tqdm progress bar when they're unsuccessful (#155).
β Tests
- Tests have been written for the CLI to assure that it's functionality remains consistent.
- Workflows were created to assure that the Wikidata queries and project structure are consistent to assure package functionality (#339, #357)
- Project queries and its structure have been updated to match the rules developed for the checks.
π Documentation
- The CLI's functionality has been fully documented (#152, #208).
- Documentation was created to show how to write Scribe-Data queries (#395).
β»οΈ Code Refactoring
word_typehas been switched todata_typethroughout the codebase (#160).- Case, gender and annotation utility functions were removed as the formatting process that used them has changed.
- The SPARQLWrapper access method has been extracted to the Wikidata utils and is imported into the files that need it (#164).
- Export data paths have been converted to centrally saved variables to reduce hard coded string repetition.
- Many files were renamed including
update_data.pybeing renamedquery_data.py - Paths within the package have been updated to work for all operating systems via
pathlib(#125). - The language formatting scripts have been dramatically simplified given changes to export paths all being the same.
- The
update_filesdirectory was removed in preparation of other means of showing data totals. - The
language_data_extractiondirectory was moved under the Wikidata directory as it's only used for those processes now (#446). - The emoji keyword process was centralized to simplify project maintenance (#359).
- PyICU was removed as a dependency and a process was made to install it and its needed dependencies given the operating system of the user (#196).
- The data formatting step was centralized such that we only have one for all languages (#142).
- Sub-query processes are now no longer hard coded such that we'd need to maintain the total possible sub-queries within the
query_data.pyprocess.
Scribe-Data 3.3.0
β¨ Features
- The translation process has been updated to allow for translations from non-English languages (#72, #73, #74, #75, #75, #76, #77, #78, #79).
π Documentation
- The documentation has been given a new layout with the logo in the top left (#90).
- The documentation now has links to the code at the top of each page (#91).
π Bug Fixes
- Annotation bugs were removed like repeat or empty values.
- Perfect tenses of Portuguese verbs were fixed via finding the appropriate PID (#68).
- Note that the most common past perfect property is not the standard one, so this will need to be fixed.
β»οΈ Code Refactoring
- pre-commit have been added to the repo to improve the development experience (#137).
- Code formatting was shifted from black to Ruff.
- A Ruff based GitHub workflow was added to check the code formatting and lint the codebase on each pull request (#109).
- The
_update_filesdirectory was renamedupdate_filesas these files are used in non-internal manners now (#57). - A common function has been created to map Wikidata ids to noun genders (#69).
- The project now is installed locally for development and command line usage, so usages of
sys.pathhave been removed from files (#122). - The directory structure has been dramatically streamlined and includes folders for future projects where language data could come from other sources like Wiktionary (#139).
- Translation files are moved to their own directory.
- The
extract_transformdirectory has been removed and all files within it have been moved one level up. - The
languagesdirectory has been renamedlanguage_data_extraction. - All files within
wikidata/_resourceshave been moved to theresourcesdirectory. - The gender and case annotations for data formatting have now been commonly defined.
- All language directory
formatted_datafiles have been now moved to thescribe_data_json_exportdirectory to prepare for outputs being required to be directed to a directory outside of the package. - Path computing has been refactored throughout the codebase, and unneeded functions for data transfers have been removed.