NovelCraft is a dataset of images and symbolic world-state (as JSON text) gathered from every step of a virtual agent navigating a MineCraft-like 3D video game world. The agent repeats its task multiple times, starting from scratch in each episode. In some episodes, we have made open-world modifications to the environment, altering the objects in the world, sometimes impacting gameplay.
You can use this dataset for anomaly/novelty detection (visual or symbolic) as well as generalized category discovery.
Please give it a try! Many more future uses are possible!
Aug. 2022: Expanded Dataset Released: NovelCraft+
10x larger dataset of standard images now available: Data Access
To enable autonomous agents to perform useful tasks, we need to develop capabilities to recognize and adapt to novelty in open worlds.
We hope this dataset release catalyzes research in several directions:
Focus on open worlds with complex scenes.
Focus on integration of perception and action.
The virtual world we build upon (with permission from the creators) is PolyCraft, a modification of the videogame MineCraft developed by a research team at UT-Dallas (Smaldone et al. (2017),Goss et al. (2023)). Polycraft is multi-purpose software with several applications. We use a particular environment, PolyCraft POGO, created for DARPA-funded research on open-world AI.
Within the Polycraft POGO environment, an agent is tasked with exploring a 3-dimensional voxel world via a sequence of actions such as move, break, or craft. The agent’s goal is to create a pogo stick from resources, such as wood and rubber, that must be gathered from the environment. Gathering these resources may be simple, such as breaking trees for wood, or require multiple steps, such as crafting, placing, and using a tree tap for sap. Completing the task requires the agent to execute a plan roughly 60 actions long. Moving the agent any distance requires only a single action, so many actions involve the agent interacting with an object or another environment-controlled agent.
At each step, the agent observes both an image and the environment’s symbolic state in JSON text format. Example images and symbolic information are shown in the figure below. Each image is a 256x256 pixel RGB depiction of the agent’s current perspective. The JSON text includes positions and names for every object in the environment and every environment-controlled agent, as well as the agent’s position and state information (what materials it has collected).
Our dataset supports several meaningful novelty detection tasks using either visual and symbolic data.
- Given a 256x256 pixel image of a complex scene obtained from one step of our agent, does the image show novel content?
- Given a short window of JSON text representations of the symbolic world-state (obtained from 10 adjacent steps of the agent), does the window contain novel content?
- Given a collection of images, many unlabeled but some labeled as examples of known (“normal”) classes, can you discover a clustering that aligns with the true (but unavailable) class labels? Many images show content from other (“novel”) classes not in the labeled set.
An effective solution here would not only be able to detect a novel “pumpkin” item in the NovelCraft environment, but also be able to say it was related to another “pumpkin” seen in a previous episode. Task 3 is known as Generalized Category Discovery, as introduced by Vaze et al. (2022).
This is a step toward our ultimate goal: not just to detect, but to adapt!