Releases · keras-team/keras-hub

23 Dec 01:04

v0.4.0.dev0

48ccbf1

v0.4.0.dev0 Pre-release

Pre-release

⚠️⚠️⚠️ This is a pre-release for testing purposes, documentation for this release has not yet shipped.

The KerasNLP 0.4 adds support for pretrained models to the API via keras_nlp.models. If you encounter any problems or have questions, please open an issue or discussion of the discussion tab!

Breaking Changes

Renamed keras_nlp.layers.MLMHead -> keras_nlp.layers.MaskedLMHead.
Renamed keras_nlp.layers.MLMMaskGenerator -> keras_nlp.layers.MaskedLMMaskGenerator.
Renamed keras_nlp.layers.UnicodeCharacterTokenizer -> keras_nlp.layers.UnicodeCodepointTokenizer.
Switched the default of lowercase in keras_nlp.tokenizers.WordPieceTokenizer from True to False.
Renamed the token id output of MaskedLMMaskGenerator from "tokens" to "tokens_ids".

Summary

Added the keras_nlp.models API.
- Adds support for BERT, DistilBERT, RoBERTa, and XLM-RoBERTa models and pretrained checkpoints.
Added new metrics.
- keras_nlp.metrics.Bleu and keras_nlp.metrics.EditDistance.
Added new vocabulary training utilities.
- keras_nlp.tokenizers.compute_word_piece_vocabulary and keras_nlp.tokenizers.compute_sentence_piece_proto.

What's Changed

Add Edit Distance Metric by @abheesht17 in #231
Minor fix to simplify and test handling of max_length prompts by @jbischof in #258
Remove split regex args for WordPieceTokenizer by @mattdangerw in #255
Add instructions on installing the latest changes by @mattdangerw in #261
Add warning when k > vocab_size in top_k_search by @jbischof in #260
Fix keras library imports and usage by @jbischof in #262
Add BLEU Score by @abheesht17 in #222
Configure GKE-based accelerator testing by @chenmoneygithub in #265
Added WordPieceTokenizer training function by @jessechancy in #256
Add requirements.txt for cloud build by @chenmoneygithub in #267
Global Seed Bug Fix by @jessechancy in #269
Update accelerator testing to use the new GCP project by @chenmoneygithub in #272
Fixed typo: "recieved" by @ehrencrona in #273
Reuse dense pooled output for fine tuning by @mattdangerw in #251
Simplify BERT modeling, use keras embeddings by @mattdangerw in #253
Rename UnicodeCharacterTokenizer>UnicodeCodepointTokenizer by @mattdangerw in #254
Add README for accelerator testing config folder by @chenmoneygithub in #276
Random Deletion Layer by @aflah02 in #214
Made trainer more efficient. Loading full files instead of using TextLineDataset. by @jessechancy in #280
Use KerasNLP for BERT preprocessing for GLUE by @mattdangerw in #252
Minor fixes to the Random Deletion Layer by @aflah02 in #286
Fixes for WordPieceTrainer by @aflah02 in #293
Update default to strip_accents=False by @jessechancy in #289
Move Bert to models folder by @jbischof in #288
Make Decoding Functions Graph-compatible (with XLA Support!) by @abheesht17 in #271
SentencePieceTrainer by @aflah02 in #281
Rename models.Bert() to models.BertCustom() by @jbischof in #310
Add a test for variable sequence length inputs by @mattdangerw in #313
Support checkpoint loading for BertBase by @jbischof in #299
RoBERTa pretrained model forward pass by @jessechancy in #304
Register objects as serializable by @mattdangerw in #292
Style merging for Bert and Roberta by @jbischof in #315
Streamline and speed up tests by @jbischof in #324
Add Support for CJK Char Splitting for WordPiece Tokenizer by @abheesht17 in #318
Clean up model input names for consistency by @mattdangerw in #327
Return a single tensor from roberta by @mattdangerw in #328
BERT, RoBERTa: Add model.compile UTs by @abheesht17 in #330
Continue rename of bert model inputs by @mattdangerw in #329
Text Generation Utilities: Add Support for Ragged Inputs by @abheesht17 in #300
bert_base_zh, bert_base_multi_cased: Add BERT Base Variants by @abheesht17 in #319
WordPiece vocabularies trainer on Wikipedia dataset by @jessechancy in #316
Use the exported ragged ops for RandomDeletion by @mattdangerw in #332
Random Swap Layer by @aflah02 in #224
Fixes for Random Deletion Layer by @aflah02 in #339
Move cloudbuild to a hidden directory by @mattdangerw in #345
Fix the build by @mattdangerw in #349
Migrating from Datasets to TFDS for GLUE Example by @aflah02 in #340
Move network_tests into keras_nlp/ by @mattdangerw in #344
Stop hardcoding 2.9 by @mattdangerw in #351
Add BERT Large by @abheesht17 in #331
Add normalize_first arg to Transformer Layers by @abheesht17 in #350
Add Small BERT Variants by @abheesht17 in #338
Beam Search: Add Ragged and XLA Support by @abheesht17 in #341
Fix download paths for bert weights by @mattdangerw in #356
Add a BertPreprocessor class by @mattdangerw in #343
Text Generation Functions: Add Benchmark Script by @abheesht17 in #342
Improve readability for encoder/decoder blocks by @mattdangerw in #353
Add GPT-2 Model and its Variants by @abheesht17 in #354
Clean up BERT, RoBERTa doc-strings by @abheesht17 in #359
Create unique string id for each BERT backbone by @jbischof in #361
Use model.fit() for BERT Example by @abheesht17 in #360
Minor Fixes in BertPreprocessor Layer by @abheesht17 in #373
Clone user passed initializers called multiple times by @mattdangerw in #371
Update BERT model file structure by @mattdangerw in #376
Move gpt model code into a directory by @mattdangerw in #379
Move roberta model code into a directory by @mattdangerw in #380
Reorg test directories by @mattdangerw in #384
Add XLM-RoBERTa by @abheesht17 in #372
Add DistilBERT by @abheesht17 in #382
Stop running CI on Windows by @mattdangerw in #386
Fix Bert serialization by @mattdangerw in #385
Improve MacOS support and pin tensorflow version during testing by @mattdangerw in #383
Unify BERT model API in one class by @jbischof in #387
Add from_preset constructor to BertPreprocessor by @jbischof in #390
More robustly test BERT preprocessing by @mattdangerw in #394
Move name and trainable to kwargs by @jbischof in #399
Add backbone as property for task models by @jbischof in #398
Set default name of Bert instance to "backbone" by @jbischof in #397
Fix gpt2 serialization by @mattdangerw in #391
Fix distilbert serialization by @mattdangerw in #392
Fix roberta and xlm-roberta serialization by @mattdangerw in https:...

Contributors

jbischof, mattdangerw, and 9 other contributors

Assets 2

11 Nov 22:05

mattdangerw

v0.3.1

2da5f34

v0.3.1

Summary

Add keras_nlp.tokenizers.BytePairTokenizer with tf.data friendly support for the tokenization used by GPT-2, RoBERTa and other models.
Remove the hard dependency on tensorflow and tensorflow-text when pip installing on MacOS, to accommodate M1 chips. See this section of our contributor guide for more information on MacOS development.

What's Changed

Cherry picks 0.3 by @mattdangerw in #454
Bump version for 0.3.1 pre release by @mattdangerw in #456
Remove dev prefix for 0.3.1 release by @mattdangerw in #457

Full Changelog: v0.3.0...v0.3.1

Contributors

mattdangerw

Assets 2

30 Jun 00:55

mattdangerw

v0.3.0

f9abc8f

v0.3.0

Summary

Added keras_nlp.tokenizers.SentencePieceTokenizer.
Added two token packing layers keras_nlp.layers.StartEndPacker and keras_nlp.layers.MultiSegmentPacker.
Added two metrics, keras_nlp.metrics.RougeL and keras_nlp.metrics.RougeN based on the rouge-score package.
Added five utilities for generating sequences, keras_nlp.utils.greedy_search, keras_nlp.utils.random_search, keras_nlp.utils.top_k_search, keras_nlp.utils.top_p_search, keras_nlp.utils.beam_search.

What's Changed

Greedy text generation util by @chenmoneygithub in #154
Remove incorrect embedding size limit by @mattdangerw in #195
Fix inits for bert heads by @mattdangerw in #192
Add keras.io links to README by @mattdangerw in #196
Minor Corrections In ROADMAP.md by @saiteja13427 in #200
Fix Loose Dependency Imports by @abheesht17 in #199
Reorganize examples by @mattdangerw in #179
Remove bert config arguments from README by @mattdangerw in #205
Add checkpoints to BERT training by @chenmoneygithub in #184
Run keras tuner from a temp directory by @mattdangerw in #202
Token and position embedding minor fixes by @mattdangerw in #203
Correct typo in WordPieceTokenizer by @abheesht17 in #208
Add TPU support to BERT example by @chenmoneygithub in #207
Remove type annotations for complex types by @mattdangerw in #194
Issue 182: Modified TransformerDecoder with optional parameter by @jessechancy in #217
Add StartEndPacker layer by @abheesht17 in #221
Add a layer for packing inputs for BERT-likes by @mattdangerw in #88
Ignore UserWarning to fix nightly testing breakage by @chenmoneygithub in #227
Add ROUGE Metric by @abheesht17 in #122
Allow long lines for links in docstrings by @mattdangerw in #229
Random Sampling Util for Text Generation by @jessechancy in #228
added top k search util by @jessechancy in #232
top p search and testing by @jessechancy in #233
Add a SentencePiece tokenizer by @mattdangerw in #218
Add cloud training support for BERT example by @chenmoneygithub in #226
Bump version to 0.3.0 for upcoming release by @mattdangerw in #239
Add support for StartEndPacker packing 2D tensor by @jessechancy in #240
Fixed Bug with Unicode Tokenizer Vocab Size by @aflah02 in #243
Fixed Import for top_p_search util by @aflah02 in #245
MultiSegmentPacker support for 2D dense tensor by @jessechancy in #244
Minor fixes for multi-segment packer by @mattdangerw in #246
Add beam search decoding util by @jessechancy in #237

New Contributors

@saiteja13427 made their first contribution in #200
@jessechancy made their first contribution in #217

Full Changelog: v0.2.0...v0.3.0

Contributors

mattdangerw, chenmoneygithub, and 4 other contributors

Assets 2

18 May 17:42

mattdangerw

v0.2.0

cb0fa02

v0.2.0

Summary

Documentation live on keras.io.
Added two tokenizers: ByteTokenizer and UnicodeCharacterTokenizer.
Added a Perplexity metric.
Added three layers TokenAndPositionEmbedding, MLMMaskGenerator and MLMHead.
Contributing guides and roadmap.

What's Changed

Add Byte Tokenizer by @abheesht17 in #80
Fixing rank 1 outputs for WordPieceTokenizer by @aflah02 in #92
Add tokenizer accessors to the base class by @mattdangerw in #89
Fix word piece attributes by @mattdangerw in #97
Small fix: change assertEquals to assertEqual by @chenmoneygithub in #103
Added a Learning Rate Schedule for the BERT Example by @Stealth-py in #96
Add Perplexity Metric by @abheesht17 in #68
Use the black profile for isort by @mattdangerw in #117
Update README with release information by @mattdangerw in #118
Add a class to generate LM masks by @chenmoneygithub in #61
Add docstring testing by @mattdangerw in #116
Fix broken docstring in MLMMaskGenerator by @chenmoneygithub in #121
Adding a UnicodeCharacterTokenizer by @aflah02 in #100
Added Class by @adhadse in #91
Fix bert example so it is runnable by @mattdangerw in #123
Fix the issue that MLMMaskGenerator does not work in graph mode by @chenmoneygithub in #131
Actually use layer norm epsilon in encoder/decoder by @mattdangerw in #133
Whitelisted formatting and lint check targets by @adhadse in #126
Updated CONTRIBUTING.md for setup of venv and standard pip install by @adhadse in #127
Fix mask propagation of transformer layers by @chenmoneygithub in #139
Fix masking for TokenAndPositionEmbedding by @mattdangerw in #140
Fixed no oov token error in vocab for WordPieceTokenizer by @adhadse in #136
Add a MLMHead layer by @mattdangerw in #132
Bump version for 0.2.0 dev release by @mattdangerw in #142
Added WSL setup text to CONTRIBUTING.md by @adhadse in #144
Add attribution for the BERT modeling code by @mattdangerw in #151
Remove preprocessing subdir by @mattdangerw in #150
Word piece arg change by @mattdangerw in #148
Rename max_length to sequence_length by @mattdangerw in #149
Don't accept a string dtype for unicode tokenizer by @mattdangerw in #147
Adding Utility to Detokenize as list of Strings to Tokenizer Base Class by @aflah02 in #124
Fixed Import Error by @aflah02 in #161
Added KerasTuner Hyper-Parameter Search for the BERT fine-tuning script. by @Stealth-py in #143
Docstring updates for upcoming doc publish by @mattdangerw in #146
version bump for 0.2.0.dev2 pre-release by @mattdangerw in #165
Added a vocabulary_size argument to UnicodeCharacterTokenizer by @aflah02 in #163
Simplified utility to preview a tfrecord by @mattdangerw in #168
Update BERT example's README with data downloading instructions by @chenmoneygithub in #169
Add a call to repeat during pretraining by @mattdangerw in #172
Add an integration test matching our quick start by @mattdangerw in #162
Modify README of bert example by @chenmoneygithub in #174
Fix the finetuning script's loss and metric config by @chenmoneygithub in #176
Minor improvements to the position embedding docs by @mattdangerw in #180
Update docs for upcoming 0.2.0 release by @mattdangerw in #158
Restore accidentally deleted line from README by @mattdangerw in #185
Bump version for 0.2.0 release by @mattdangerw in #186
Pre release fix by @mattdangerw in #187

New Contributors

@Stealth-py made their first contribution in #96
@adhadse made their first contribution in #91

Full Changelog: v0.1.1...v0.2.0

Contributors

mattdangerw, chenmoneygithub, and 4 other contributors

Assets 2

03 May 21:39

mattdangerw

v0.2.0.dev2

8e45f67

v0.2.0.dev2 Pre-release

Pre-release

What's Changed

Added WSL setup text to CONTRIBUTING.md by @adhadse in #144
Add attribution for the BERT modeling code by @mattdangerw in #151
Remove preprocessing subdir by @mattdangerw in #150
Word piece arg change by @mattdangerw in #148
Rename max_length to sequence_length by @mattdangerw in #149
Don't accept a string dtype for unicode tokenizer by @mattdangerw in #147
Adding Utility to Detokenize as list of Strings to Tokenizer Base Class by @aflah02 in #124
Fixed Import Error by @aflah02 in #161
Added KerasTuner Hyper-Parameter Search for the BERT fine-tuning script. by @Stealth-py in #143
Docstring updates for upcoming doc publish by @mattdangerw in #146
version bump for 0.2.0.dev2 pre-release by @mattdangerw in #165

Full Changelog: v0.2.0-dev.1...v0.2.0.dev2

Contributors

mattdangerw, Stealth-py, and 2 other contributors

Assets 2

26 Apr 02:27

mattdangerw

v0.2.0-dev.1

fb4a84d

v0.2.0-dev.1 Pre-release

Pre-release

What's Changed

Add Byte Tokenizer by @abheesht17 in #80
Fixing rank 1 outputs for WordPieceTokenizer by @aflah02 in #92
Add tokenizer accessors to the base class by @mattdangerw in #89
Fix word piece attributes by @mattdangerw in #97
Small fix: change assertEquals to assertEqual by @chenmoneygithub in #103
Added a Learning Rate Schedule for the BERT Example by @Stealth-py in #96
Add Perplexity Metric by @abheesht17 in #68
Use the black profile for isort by @mattdangerw in #117
Update README with release information by @mattdangerw in #118
Add a class to generate LM masks by @chenmoneygithub in #61
Add docstring testing by @mattdangerw in #116
Fix broken docstring in MLMMaskGenerator by @chenmoneygithub in #121
Adding a UnicodeCharacterTokenizer by @aflah02 in #100
Added TokenAndPositionEmbedding Class by @adhadse in #91
Fix bert example so it is runnable by @mattdangerw in #123
Fix the issue that MLMMaskGenerator does not work in graph mode by @chenmoneygithub in #131
Actually use layer norm epsilon in encoder/decoder by @mattdangerw in #133
Whitelisted formatting and lint check targets by @adhadse in #126
Updated CONTRIBUTING.md for setup of venv and standard pip install by @adhadse in #127
Fix mask propagation of transformer layers by @chenmoneygithub in #139
Fix masking for TokenAndPositionEmbedding by @mattdangerw in #140
Fixed no oov token error in vocab for WordPieceTokenizer by @adhadse in #136
Add a MLMHead layer by @mattdangerw in #132
Bump version for 0.2.0 dev release by @mattdangerw in #142

New Contributors

@Stealth-py made their first contribution in #96
@adhadse made their first contribution in #91

Full Changelog: v0.1.1...v0.2.0-dev.1

Contributors

mattdangerw, chenmoneygithub, and 4 other contributors

Assets 2

05 Apr 20:57

mattdangerw

v0.1.1

94ac3ce

v0.1.1

What's Changed

Add tokenizer helper to convert tokens to ids by @mattdangerw in #75
Add a sinusoidal embedding layer by @amantayal44 in #59
Add a learned positional embedding layer by @hertschuh in #47
Fix typo in position embedding docstring by @mattdangerw in #86
Bump version number to 0.1.1 by @mattdangerw in #90

New Contributors

@amantayal44 made their first contribution in #59
@hertschuh made their first contribution in #47

Full Changelog: v0.1.0...v0.1.1

Contributors

hertschuh, mattdangerw, and amantayal44

Assets 2

29 Mar 21:56

mattdangerw

v0.1.0

2e9ea8d

v0.1.0

Initial release of keras-nlp with word piece tokenizer and transformer encoder/decoder blocks.

This is a v0 release, with no API compatibility guarantees.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Breaking Changes

Summary

What's Changed

Contributors

Uh oh!

Summary

What's Changed

Contributors

Uh oh!

Summary

What's Changed

New Contributors

Contributors

Uh oh!

Summary

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

Uh oh!

Releases: keras-team/keras-hub

v0.4.0.dev0

Breaking Changes

Summary

What's Changed

Contributors

Uh oh!

v0.3.1

Summary

What's Changed

Contributors

Uh oh!

v0.3.0

Summary

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.0

Summary

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.0.dev2

What's Changed

Contributors

Uh oh!

v0.2.0-dev.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.0

Uh oh!