ML observability delivers significant value, especially in mission-critical use cases. Even with the most advanced techniques, issue resolution, risk assessments or alert recommendations with ML models are not useful unless your models are trained on relevant, diverse datasets. Such gaps in data can affect the performance, generalization capabilities, and reliability of ML models. Also, model risk assessment is highly limited to availability of test data sets.
Unless that is, you use synthetic data in machine learning projects.
While Synthetic and generative AI models have always been considered techniques to improve model performance or labelling, they can also serve as an effective approach to ML Observability.
We are thrilled to announce that we are adding the 'Synthetic AI' component to AryaXAI, an ML Observability platform for mission-critical functions. The 'AryaXAI Synthetics' component in AryaXAI unlocks a more effective approach to ML Observability.
The need for Synthetic AI in ML Observability:
Unreliable predictions or poor performances can create security and operational risks. To provide seamless service and deliver timely value, the models must gain extensive knowledge about their environments, ensuring complete safety and competency.
Lack of access to live data in prod. and pre-prod. further complicates the situation, creating a circular dependency between these two processes - On one hand, we need a trained model to be deployed in production to gather new data, but on the other hand, we require the data collected during production to train and improve the model.
While ML observability highlights the critical issues, Synthetic AI can be used as a powerful technique for model prognoisis to resolve the gaps and improve model metrics.
Introducing AryaXAI Synthetics: Synthetic AI to compliment ML Observability
With AryaXAI Synthetics, users can create high-quality synthetic data to resolve critical data gaps, test models at scale and preserve data privacy. It can play a crucial role in resolving the challenges posed by data gaps in machine learning models, equipping them to learn from diverse and extensive datasets and ensuring they are well-equipped to handle real-world scenarios effectively.
Details about the models we are offering:
AryaXAI Synthetics offers an off-the-shelf implementation of advanced synthetic AI models. We currently offer GPT-2 & GAN models for generating synthetic data.
- Generative Pre-trained Transformer 2 (GPT-2): We implemented the autoregressive GPT-2 model to learn from the real data and create high quality synthetic data. Our implementation is from this paper - https://arxiv.org/abs/2302.02041. While it uses more compute and takes time, we observed that GPT-2 provides very high-quality data.
- Generative Adversarial Networks (GANs): GANs were the defacto method to generate synthetic data. We implemented a modified version of GANs for tabular data called Conditional Tabular GANs reference from this paper - https://arxiv.org/abs/2204.00401.
Prompting & Conditional Sampling
One of the main focus areas for AryaXAI Synthetics was to be able to generate not only high-quality synthetic data but also preferred synthetic data. We are glad to introduce prompting to Synthetic ‘AI’. Users can define the prompts or conditions and generate conditional sampling. For example, users can define prompts/ conditions like - generating 1,000,000 samples where fraud_target == ‘1’ and many more.
Using AryaXAI Synthetics
Our no-code implementation enables any user to initiate high-quality synthetic model training in just a few minutes.
Here is a jogged version of the walkthrough:
Deployment
AryaXAI Community Version:
In the AryaXAI community version, GPU usage is capped at 6hrs per job.
AryaXAI Enterprise Version:
Users have the option to choose more advanced compute like H100, A100 etc. This version is also deployable within the customer’s on-premise or cloud.
Note: AryaXAI is currently an invite-only platform. You can request an invitation here