Project Overview


NovelCraft is a dataset of images and symbolic world-state (as JSON text) gathered from every step of a virtual agent navigating a MineCraft-like 3D video game world. The agent repeats its task multiple times, starting from scratch in each episode. In some episodes, we have made open-world modifications to the environment, altering the objects in the world, sometimes impacting gameplay.

You can use this dataset for anomaly/novelty detection (visual or symbolic) as well as generalized category discovery.

Please give it a try! Many more future uses are possible!

Jump to: Announcements   Motivation   Dataset Summary   Detection Tasks

Announcements

Mar. 2023: Paper accepted at Transactions on Machine Learning Research (TMLR)

Read the paper on arXiv or OpenReview

Aug. 2022: Expanded Dataset Released: NovelCraft+

10x larger dataset of standard images now available: Data Access

Jun. 2022: Dataset Released

We’ve released our dataset and code: Data Access Code

Motivation

To enable autonomous agents to perform useful tasks, we need to develop capabilities to recognize and adapt to novelty in open worlds.

We hope this dataset release catalyzes research in several directions:

  • Focus on open worlds with complex scenes.

  • Focus on integration of perception and action.

Dataset Summary

The virtual world we build upon (with permission from the creators) is PolyCraft, a modification of the videogame MineCraft developed by a research team at UT-Dallas (Smaldone et al. (2017),Goss et al. (2023)). Polycraft is multi-purpose software with several applications. We use a particular environment, PolyCraft POGO, created for DARPA-funded research on open-world AI.

Within the Polycraft POGO environment, an agent is tasked with exploring a 3-dimensional voxel world via a sequence of actions such as move, break, or craft. The agent’s goal is to create a pogo stick from resources, such as wood and rubber, that must be gathered from the environment. Gathering these resources may be simple, such as breaking trees for wood, or require multiple steps, such as crafting, placing, and using a tree tap for sap. Completing the task requires the agent to execute a plan roughly 60 actions long. Moving the agent any distance requires only a single action, so many actions involve the agent interacting with an object or another environment-controlled agent.

At each step, the agent observes both an image and the environment’s symbolic state in JSON text format. Example images and symbolic information are shown in the figure below. Each image is a 256x256 pixel RGB depiction of the agent’s current perspective. The JSON text includes positions and names for every object in the environment and every environment-controlled agent, as well as the agent’s position and state information (what materials it has collected).

NovelCraft visual and symbolic contents
Examples from our multi-modal NovelCraft dataset. Left: Images from two normal episodes (rows 1-2) and two novel episodes (rows 3-4). Detection is challenging as only a few images in novel episodes actually contain novel objects, here outlined in orange when visible. Right: Example symbolic information available at each frame. At each frame, we record a complete JSON representation of all objects and artificial agents in the world (x,z,y position in 3D, orientation, etc.) as well as player state.

Tasks

Our dataset supports several meaningful novelty detection tasks using either visual and symbolic data.

Task 1: Visual novelty detection

  • Given a 256x256 pixel image of a complex scene obtained from one step of our agent, does the image show novel content?

Task 2: Symbolic novelty detection

  • Given a short window of JSON text representations of the symbolic world-state (obtained from 10 adjacent steps of the agent), does the window contain novel content?

Task 3: Discovery and characterization

  • Given a collection of images, many unlabeled but some labeled as examples of known (“normal”) classes, can you discover a clustering that aligns with the true (but unavailable) class labels? Many images show content from other (“novel”) classes not in the labeled set.

An effective solution here would not only be able to detect a novel “pumpkin” item in the NovelCraft environment, but also be able to say it was related to another “pumpkin” seen in a previous episode. Task 3 is known as Generalized Category Discovery, as introduced by Vaze et al. (2022).

This is a step toward our ultimate goal: not just to detect, but to adapt!

References

Goss, S. A., Steininger, R. J., Narayanan, D., Olivença, D. V., Sun, Y., Qiu, P., Amato, J., Voit, E. O., Voit, W. E., & Kildebeck, E. J. (2023). Polycraft world AI lab (PAL): An extensible platform for evaluating artificial intelligence agents. arXiv Preprint arXiv:2301.11891. https://arxiv.org/abs/2301.11891
Smaldone, R. A., Thompson, C. M., Evans, M., & Voit, W. (2017). Teaching science through video games. Nature Chemistry, 9(2). https://doi.org/10.1038/nchem.2694
Vaze, S., Han, K., Vedaldi, A., & Zisserman, A. (2022). Generalized category discovery. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://arxiv.org/abs/2201.02609