Releases: ssl-hep/ServiceX_frontend
Version 2.4.1
Providing the same max range for pyarrow as coffea
PyHEP Upgrade
Lots of things were fixed after trying to run against the 70 TB of data for CMS Run1.
- Supports Python 3.9
- Will now report supported return types (parquet, root)
- Long filenames are hashed in the local cache to avoid OS limitations
- A request tittle can be passed to services to "name" the transform
- Support lists of URLS or a single URL for a file source as well as the more traditional dataset identifiers
- Support deleting a single datafile or a query status file
api_endpoints
now have names not just types- A local file can be written that matches query hashes with request id's, and can safely be checked into a repo in order to quickly re-use other people's queries.
- Better status updates during running and downloading, and support py widgets in vscode.
Bug fix: ignore title when calculating hash
- Make sure
title
is ignored when calculating hash - it makes no difference in the way the data is calculated
Dataset Name
- Add an "english" readable property to the servicex dataset that contains the name. This can be quite long, depending on the dataset.
Bug Fix: Default Resolution
Streaming/fix logic we introduced to parse through user's config files and integrate default values
Post PyHEP Release
This is the first beta of 2.4. While we believe it is feature complete, there is still some wider testing that needs to happen. The goal of this release is to support the full re-analysis of the CMS Run 1 Higgs.
New Features:
-
You can specify a single
http://
orroot://
file as input for a single file dataset. -
You can specify a list of
http://
and/orroot://
files. They will be processed by ServiceX as long as it has permission to access the data. -
A title can be given to each transform
-
Add the ability to query a dataset for what will be the data types back. This enables automatic data type discovery (required to keep the interface sensible in
coffea
and other upstream libraries). -
Python 3.9 now supported
-
Add support for the cms run 1 aod backend
type
. -
Caching
- Analysis Cache - one can create/check in a
json
file that will map queries to backendrequest-id
's. This means that others can re-run and just download the data, rather than having to re-transform the data for the same queries. - A user can delete a data file from the local cache and it will automatically be re-downloaded
- If a query status cache file is removed, it will be automatically re-fetched
- Analysis Cache - one can create/check in a
-
Configuration:
- End points now can have names rather than just types, supporting more than one backend of a single type (e.g. two
uproot
backends)
- End points now can have names rather than just types, supporting more than one backend of a single type (e.g. two
Bug Fixes:
- If the backend has lost the data, automatically resubmit the query. This was broken when streaming URL's or files.
- Transforms that are marked
Fatal
are now correctly cleared from the local cache, so they can be re-run - When a transform with lots of files fails, the error report will be truncated to the result from 20 different files, rather than... all 3000.
- When a notebook is run under visual studio code, the progress bars are correctly shown (for processing and download).
StreamInfoUrl
is now exported- Protect against filenames that are so long that the OS can't handle them. In particular, fix the current implementation so it has a more robust hashing mechanism for the modified filename.
In Progress:
- Added logging information to support debugging the local machine downloading. We aren't saturating good connections and it isn't clear why that is happening yet.
Fixing up a new include
Trying to track down import errors, cleaning up how we include other items
Export DatasetType properly
So others downstream can fetch is correctly
Title and File List
Two new features:
- Can add a
title
to each request using thetitle
argument with theget_xxx
methods. - Instead of a did, one can specify a list of
http://
orroot://
files to access directly.
Add cms run1 aod default formats
Add the root as the default format that comes back