Update

  • Feb,11,2023:   Oak Base is released!
  • Jan,03,2023:   Hand Mesh Recovery benchmarks on OakInk-Image are released!
  • Oct,18,2022:   OakInk public v2.1 is released!
    expand to see details Within this update, several artifacts have been fixed, including: wrong poses, time delay, and contact surface mismatches; NOTE: If you downloaded the OakInk dataset before 11:00 AM October 18, 2022, UTC, You only need to replace the previous anno.zip by this newly released: anno_v2.1.zip (access via Google Forms), unzip it and keep the same file structures as before, and install the latest OakInk Toolkit.
  • Jul,26,2022:   Tink has been made public.
  • Jun,28,2022:   OakInk public v2 & OakInk Toolkit -- a Python dataloader, are released!
  • Mar,03,2022:   OakInk got accepted by CVPR 2022.
overview

About

This website contains the OakInk-Image and OakInk-Shape datasets.
The OakInk-Image contains 230K image frames that capture a total of 12 subjects performing up to 5 intent-oriented interactions with 100 objects from 32 categories. The object poses were captured from a MoCap system, while the MANO hand poses are fitted from 2D keypoint annotation. Based on the hand-object poses from real-world human demonstrations, we transfer the hand pose on the real-world object to the virtual objects with similar affordances through a novel method: Tink. All the real-world and transferred interactions constitute the geometry-based dataset: OakInk-Shape. The OakInk-Shape contains 62K different hand-object poses and models.

Download

For researchers in China, you can download OakInk from the following mirror:
来自中国的研究者们可以使用镜像链接: 百度云盘 (hrt9) 下载数据集。

Oak Base -- Object Affordance Knowledge

  • Object parts segmentation and attribute: OakBase.zip (4.07G)

OakInk-Image -- Image-based dataset

To ensure latest version, please use sha256sum to calculate the checksum of the anno_v2.1.zip. You will get:

$ sha256sum anno_v2.1.zip --tag  
SHA256 (anno_v2.1.zip) = dc64402d65cff3c1e2dd40fb560fcc81e3757e1936f44d353c381874489d71ea.

OakInk-Shape -- Geometry-based dataset


After downloading all the above .zip files, you need to arrange them in the following structure:

 $OAKINK_DIR
    ├── OakBase.zip
    ├── image
    │   ├── anno_v2.1.zip
    │   ├── obj.zip
    │   └── stream_zipped
    │       ├── oakink_image_v2.z01
    │       ├── ...
    │       ├── oakink_image_v2.z10
    │       └── oakink_image_v2.zip
    └── shape
        ├── metaV2.zip
        ├── OakInkObjectsV2.zip
        ├── oakink_shape_v2.zip
        └── OakInkVirtualObjectsV2.zip
Next, follow the instructions to unzip and prepare the datasets.


Datasheet & Explanation

OakInk-Image -- Image-based dataset

Dataset Structure

  • Image Sequence: with resolution (848x480).
  • Annotation: 2D/3D positions for 21 keypoints of the hand; MANO's pose & shape parameters; MANO's vertex 3D locations; object's vertex 3D locations; object's .obj models, camera calibration (intrinsics & extrinsics); subject ID; intent ID; data split files (train/val/test).
  • Visualization Code: viz_oakink_image.py

OakInk-Image provides data splits for two categories of tasks: Hand Mesh Recovery and Hand-Object Pose Estimation. The dataset contains in total 314,404 frames if no filtering is applied, in which 157,600 frames are from two-hand sequences. For single view tasks, we filter out frames that have less than 50% of joints falling in the bounds of the images. Note these frames might still be useful in multiview tasks. Refer to oikit repo for the usage of these split files.


Splits for Hand Mesh Recovery

For Hand Mesh Recovery task, we offer three different split modes. The details of each split modes are described as below.

SP0. Default split (split by views)

We randomly select one view per sequence and mark all images from this view as the test sequence, while the rest three views form the train/val sequences.

Train+Val set
* Train+Val: 232,049 frames; in which 114,697 frames are from two-hand sequences.

Test set
* Test: 77,330 frames; in which 38,228 frames are from two-hand sequences.

We also provide an example train/val split on the train+val set. The val set is randomly sampled from the train+val set.

Train set
* Train: 216,579 frames; in which 107,043 frames are from two-hand sequences.

Val set
* Val: 15,470 frames; in which 7,654 frames are from two-hand sequences.

SP1. Subject split (split by subjects)

We select five subjects and mark all images containing these subjects as the test sequence, while the images not containing these subjects form the train/val sequences. Note some sequences involving two-hand interactions between subjects in test set and subjects in train/val set are dropped.

Train+Val set
* Train+Val: 192,532 frames; in which 82,539 frames are from two-hand sequences.

Test set
* Test: 83,503 frames; in which 37,042 frames are from two-hand sequences.

We also provide an example train/val split on the train+val set. We select one subject to form the val set, and the remaining subjects form the train set. Similar as the split of the test set, sequences having overlapped subjects are dropped.

Train set
* Train: 177,490 frames; in which 73,658 frames are from two-hand sequences.

Val set
* Val: 6,151 frames. No frames from two-hand sequences as one subject is included.

SP2. Object split (split by objects)

We randomly select 25 objects (out of total 100 objects) and mark all sequences that contain these objects as the test sequences, while the sequences that contain the rest 75 objects form the train/val sequences.

Train+Val set
* Train+Val: 230,832 frames; in which 116,501 frames are from two-hand sequences.

Test set
* Test: 78,547 frames; in which 36,424 frames are from two-hand sequences.

We also provide an example train/val split on the train+val set. We randomly select 5 objects (out of 75 objects) to form the val set and the rest objects form the train set.

Train set
* Train: 214,630 frames; in which 107,767 frames are from two-hand sequences.

Val set
* Val: 16,202 frames; in which 8,734 frames are from two-hand sequences.

Splits for Hand-Object Pose Estimation

For Hand-Object Pose Estimation task, we offer one split mode based on views. The details of the split mode is described as below.

SP0. Default split (split by views)

We randomly select one view per sequence and mark all images from this view as the test sequence, while the rest three views form the train/val sequences. We filter out frames that the min distance between hand and object surface vertices are greater than 5 mm.

Train+Val set
* Train+Val: 145,589 frames; in which 61,256 frames are from two-hand sequences.

Test set
* Test: 48,538 frames; in which 20,413 frames are from two-hand sequences.

We also provide an example train/val split on the train+val set. The val set is randomly sampled from the train+val set.

Train set
* Train: 135,883 frames; in which 57,161 frames are from two-hand sequences.

Val set
* Val: 9,706 frames; in which 4,095 frames are from two-hand sequences.


OakInk-Shape -- Geometry-based dataset

Dataset Structure

  • Annotation: Object's .obj models in its canonical system; MANO's pose & shape parameters and vertex 3D locations in object's canonical system; subject ID; intent ID; origin sequence ID; Alternative hand pose, shape, and vertex (if any, for hand-over pair).
  • Visualization Code: viz_oakink_shape.py

OakInk-Shape provides data split for tasks of Grasp Generation, Intent-based Interaction Generation, and Handover Generation. These three tasks share one data split. Details as below.


Split for Grasp Generation

We use the remainder of int(object ID's hash code) mod 10 as the split separator:

  • obj_id_hash % 10 < 8 in train split
  • obj_id_hash % 10 == 8 in val split
  • obj_id_hash % 10 == 9 in test split

* Train set
1,308 objects with 49,302 grasping hand poses. 
Including 5 intents: 11,804 use, 9,165 hold, 9,425 lift-up, 9,454 hand-out, and 9,454 receive.

* Val set
166 objects with 6,522 grasping hand poses. 
Including 1,561 use, 1,239 hold, 1,278 lift-up, 1,222 hand-out, and 1,222 receive.

* Test set
183 objects with 6,222 grasping hand poses. 
Including 1,473 use, 1,115 hold, 1,122 lift-up, 1,256 hand-out, and 1,256 receive.

* Total set
We release 1,801 object CAD models, of which 1,657 models have corresponding grasping hand poses. The total number of grasping poses is 62,046

Considerations for Using the Data

  • Licensing Information: Codes are MIT license. Dataset is CC BY-NC-ND 4.0 license.
  • IRB approval: The third-party crowd-sourcing company warrants appropriate IRB approval (or equivalent, based on local government requirements) are obtained.
  • Portrait Usage: All the subjects involved in data collection are required to sign a contract with the third-party crowd-sourcing company, involving permission on the portrait usage, the acknowledgment of data usage, and payment policy. We desensitized all samples in the dataset by blurring the subjects’ faces (if any), tattoos, rings, or any other accessories that may be offensive or reveal the subjects’ identity.

Maintenance

Acknowledgements

If you find our work useful in your research, please cite:
@InProceedings{YangCVPR2022OakInk,
    author = {Yang, Lixin and Li, Kailin and Zhan, Xinyu and Wu, Fei and Xu, Anran and Liu, Liu and Lu, Cewu},
    title = {{OakInk}: A Large-Scale Knowledge Repository for Understanding Hand-Object Interaction},
    booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2022},
}
              


The website template was borrowed from Michaël Gharbi.