PyTaj
PyTaj is built on PyTorch and was designed for Visual Question Answering (VQA) research — for example to answer questions related to visual data and to automatically generate image captions.
PyTaj features:
-
Model Zoo: Reference implementations for state-of-the-art vision and language model including LoRRA (SoTA on VQA and TextVQA),
-
Multi-Tasking: Support for multi-tasking which allows training on multiple datasets together.
-
Datasets: Includes support for various datasets built-in including VQA, VizWiz, TextVQA and VisualDialog.
-
Modules: Provides implementations for many commonly used layers in the vision and language domain
-
Distributed: Support for distributed training based on DataParallel as well as DistributedDataParallel.
-
Unopinionated: Unopinionated about the dataset and model implementations built on top of it.
-
Customisation: Custom losses, metrics, scheduling, optimisers, tensor-board; suits users’ custom needs.