-
Open https://github.com/CODAIT/watson-studio-gallery-dax-project-template
-
Click Use this template and create a new dataset project repository in https://github.com/CODAIT/ naming it
watson-studio-gallery-XXX-projectreplacingXXXwith the data set short name. -
Create an issue in https://github.com/CODAIT/DAX-Datasets/issues/new/choose tracking development/publication progress
-
Add a link to the newly created project in this status document.
-
Review the content of this sample project https://dataplatform.cloud.ibm.com/exchange/public/entry/view/a7432f0c29c5bda2fb42749f363bd9ff to familiarize yourself with the typical content of a DAX project:
- Description
- README
- Data assets
- Notebooks
The source for this project is located here: https://github.ibm.com/CODAIT/watson-studio-gallery-dax-weather-project
This is a one-time setup of your "development" environment. (Most notebooks will use a proprietary Watson Studio package to load and store data files. Therefore notebook development should not be performed in a local Jupyter environment.)
- Download the compressed data set archive (
.tar.gz) from Cloud Object Storage into a temporary directory. - Extract the archive.
- If the data files are not of type
.csvDO NOT PROCEED.
- Log in to Watson Studio and create an empty project.
- Choose meaningful name
- Add a short project description.
- Uncheck
Restrict who can be a collaborator
- Add a project token.
- Click Settings.
- Add an Access token (any name, role must be Editor).
- Add the extracted (raw) data set files.
- Click Assets.
- Click Add to project > Data and add each raw data set file (e.g.
.csv) to the project.
- Try to export the data assets. If an error is raised because the archive is larger than 500MB, use the
Part 0 - Import Data.ipynbnotebook. Customize the notebook as follows:- Change the
dataset_download_url - Change the
data_path_name. - In the last code cell, customize the
if file.suffix != '.tgz':filter as needed if the extracted archive contains files that should not be added as data assets.
- Change the
Review the notebook development instructions in /notebooks.
- Add new notebooks ("from file") to the project using the
template-notebook.ipynbin/notebooks. - Before saving the notebook in GHE make sure to complete the following steps in Watso Studio:
- remove the first cell, which should look as follows
# @hidden_cell # The project token is an authorization token that is used to access project resources like data sources, connections, and used by platform APIs. from project_lib import Project project = Project(project_id='4...', project_access_token='p...') pc = project.project_context
This cell is automatically inserted when a user imports the project from the Watson Studio gallery. The project id and token are different for every user and every project instance and the content that works for you will not work for another user.
- clear all output cells
- remove the first cell, which should look as follows
We use ReviewNB to better visualize updates to notebooks in Github. Due to several restrictions with using this tool, this is the process for getting notebook's ready to review:
- Create a new repository in github.com/codait if you are migrating the project from Github Enterprise.
- Make the repository private.
- Copy in all of the files created thus far in the project's Github Enterprise repository (we are only able to use the public Github version of
ReviewNBat this time, hence the need for this step).
- Add a branch called
productionto the new repository.- In the repository's
Settings/Branchespage, make theproductionbranch the default (base) branch (Watson Studio currently can only push commits directly to themasterbranch of a repository, hence the need for this step).
- In the repository's
- Make sure you are assigned the
Adminrole in the Watson Studio project that you will push code to Github from:- If you are not assigned this role yet, you can have a current
Admingrant you this privilege.
- If you are not assigned this role yet, you can have a current
- If your Watson Studio account does not yet have Github integration setup with your public Github account:
- Add a Github personal access token in Watson Studio's
Settings/Integrationpage. - You can create this token by visiting your public Github's
Settings/Developer settings/Personal access tokenspage and clickingGenerate new token. Make sure to give the token repo scope.
- Add a Github personal access token in Watson Studio's
- Inside of the Watson Studio project's
Settingspage:- Scroll to the
Integrations/Github repositorysection and add the link to the new Github repository you created to which you will push your code.
- Scroll to the
- Now you are able to push commits to the
masterbranch of the new repository you created.- To push a commit, open a notebook in edit mode, click the
Github integrationbutton in the top menu bar, clickPublish on Github. - In the dialogue box, ensure the target path points to
./notebooks/your_target_notebook.ipynb, add a commit message, selectAll content except hidden code cells, and clickPublish. - Follow this set of steps every time you need to make a commit.
- To push a commit, open a notebook in edit mode, click the
- Once you are ready for your notebook to be reviewed:
- Open a PR from the
masterbranch against theproductionbranch. - Within this PR,
ReviewNBwill automatically add a button toCheck out this pull request on ReviewNB. - Make sure
ReviewNBis an Authorized Github App in your Github account'sSettings/Applications/Authorized Github Appspage to be able to use the tool to add comments and code suggestions to individual cells of a notebook.
- Open a PR from the
Use this github repository to store all the artifacts that will be used to create the Watson Studio project for this data set.
- Copy the raw data set files into
/data_assetsfollowing the instructions in /data_assets. - Copy the downloaded notebook files into
/notebooksfollowing the instructions in /notebooks. - Customize the metadata files in
/metadatafollowing the instructions in /metadata. - Complete the legal documents in
/legalfollowing the instructions in /legal.
- Follow the packaging instructions in dist.
-
Make sure you have completed the packaging instructions.
-
Complete the notebook publication checklist for each notebook.
- In the checklist document add a (company-wide readable) link to the completed data set publication approval request form.
- In the checklist document add a (company-wide readable) link to the data set publication approval.
- Save a copy of each completed document in the
legaldirectory.
-
Send an email to @gdq:
- subject: Publication approval request for DAX/[insert-dataset-name] notebooks
- recipient:
[email protected] - cc:
[email protected] - body:
Requesting your approval to publish and license the following notebooks, including its source code under the terms of the MIT license.- for each notebook
- include a link to the corresponding file in the source GH repository
- attach the completed publication checklist
- include a link to the data set's legal approval request form
- include a link to the data set's legal approval document
-
Once the request was approved the content team will work with the legal team to require clearance for the notebooks.
-
...