Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Posts

Future Blog Post

less than 1 minute read

Published: January 01, 2199

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published: August 14, 2015

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published: August 14, 2014

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published: August 14, 2013

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published: August 14, 2012

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

Portfolio item number 1

Short description of portfolio item number 1

Portfolio item number 2

Short description of portfolio item number 2

projects

Music Generation Using Machine Learning

Published: September 08, 2017

Project Goal: The goal of this project was to create a music composition application that utilizes machine learning to generate music extensions. The application allows users to play a melody on a MIDI keyboard, and the system generates a continuation of that melody for the same length of time.

Project Overview:
1) UI:
      i) Users are prompted to play a melody on a MIDI keyboard.
      ii) The application transforms the input melody using the same transformations applied to the training dataset.
      iii) The transformed melody is fed through a trained model to generate a continuation of the played notes.
      iv) The user’s recording and the model-generated continuation are saved and displayed to the user.
2) Machine Learning Pipeline:
      i) Data Transformation: Developed techniques to convert music into machine-readable embeddings.
      ii) Model Development: Designed and trained a Recurrent Neural Network (RNN) for music generation.
      iii) Integration: Integrated the final model with a user interface and a physical MIDI keyboard.

Outcome: Our team successfully created a music composition application using neural networks. The framework we developed allows for future enhancements of neural network models for music generation. This framework is adaptable, requiring minimal code changes to use different datasets and alter model parameters for future projects.

Autonomous agents for multiplayer SuperTuxCart

Published: March 14, 2023

Project Goal: The goal of this project was to train an AI agent to play SuperTuxKart, competing with other trained agents in a 2v2 hockey game.

Project Description:
1) Approach:
      i) Reinforcement Learning: Initially, we used Q-Learning for training. However, it struggled to learn optimal strategies.
      ii) Imitation Learning: To overcome this, we trained the model using imitation learning on winning strategies derived from over 2000 games.
      iii) Hybrid Model: We then improved the reinforcement learning model by using the trained imitation model for initialization. This helped further refine the agent’s strategy.

2) Future Work:
      i) Data Augmentation: Incorporate data augmentation during training to randomize initialization and improve robustness.
      ii) Internal State Controller: Explore the design of a hand-crafted internal state controller to compare its effectiveness against AI models.

Outcome: Our project successfully demonstrated the application of a hybrid approach, combining imitation learning with reinforcement learning, to train an AI agent for SuperTuxKart. The refined agent showed improved performance in minimizing player-puck and puck-goal distances, paving the way for future enhancements and comparative studies with hand-crafted controllers. This project consisted of training an agent to play SuperTuxKart to compete with the other trained agents in a 2v2 hockey game. Our strategy was to utilize reinforcement learning to minimize the player-puck and puck-goal distance. Our initial attempt at using Q Reinforcement Learning had trouble learning optimal strategies. Imitation Learning was then used to train the model on the winning strategies of over 2000 games. Our reinforcement learning model was then improved by using the imitation model as its initialization in order to further refine the agent’s strategy. Future work can involve utilizing data augmentation during training in order to randomize initialization. A hand-crafted internal state controller could also be designed to see if it is superior to the AI models.

J.A.R.V.I.S

Published: February 08, 2024

Chatbot that answers factual questions based on the dense telecom documentation (Ericsson and 3GPP documentation). Created a data pipeline that ingested pdfs, json and html files in order to get encodings. The chatbot consisted of two modes i) document search which used the multi-qa-mpnet-base-dot-v1 model. ii) generative which used the llama-2-7B-chat-hf model Allows Radio Engineers to quickly search up information without needing to go through the Ericsson adnd 3GPP documentation.

Rhythms of the Machine: Deep Learning in Music Creation

Published: November 26, 2024

Project Goal: The goal of this project is to create an application that allows musicians to create extensions to their input. I created and trained an LSTM-based recurrent neural network designed to model monophonic music with expressive timing and dynamics.

Outcome: The model successfully generated fluid extensions, maintaining the melody and rhythmic structure. The framework I developed allows for future enhancements of neural network models for music generation. This adaptable framework requires minimal code changes to use different datasets and alter model parameters for future projects. The model can use any variable length of music input to generate an extension or no input to create a fresh composition. Here are a few examples generated by the model:

Input:

Generated Extensions:

The model is also able to generate compositions from scratch without any input. For the outputs below, the model had no prior musical context and had to rely solely on its learned weights and predictions to construct the compositions:

publications

Paper Title Number 1

Published in Journal 1, 2009

This paper is about the number 1. The number 2 is left for future work.

Recommended citation: Your Name, You. (2009). "Paper Title Number 1." Journal 1. 1(1). http://academicpages.github.io/files/paper1.pdf

Paper Title Number 2

Published in Journal 1, 2010

This paper is about the number 2. The number 3 is left for future work.

Recommended citation: Your Name, You. (2010). "Paper Title Number 2." Journal 1. 1(2). http://academicpages.github.io/files/paper2.pdf

Paper Title Number 3

Published in Journal 1, 2015

This paper is about the number 3. The number 4 is left for future work.

Recommended citation: Your Name, You. (2015). "Paper Title Number 3." Journal 1. 1(3). http://academicpages.github.io/files/paper3.pdf

Paper Title Number 4

Published in GitHub Journal of Bugs, 2024

This paper is about fixing template issue #693.

Recommended citation: Your Name, You. (2024). "Paper Title Number 3." GitHub Journal of Bugs. 1(3). http://academicpages.github.io/files/paper3.pdf

talks

Talk 1 on Relevant Topic in Your Field

Published: March 01, 2012

This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!

Tutorial 1 on Relevant Topic in Your Field

Published: March 01, 2013

More information here

Talk 2 on Relevant Topic in Your Field

Published: February 01, 2014

More information here

Conference Proceeding talk 3 on Relevant Topic in Your Field

Published: March 01, 2014

This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.

SearchDistribute: webscraping search results on an academic budget

Academic, , 2017

retrieves upto 250,000 Search engine results per day. With a $5/month VPN subscription, can extract upto 10,000+ search results per query per hour (120x cheaper than Google Search API). Built using Python and Selenium, coordinates multiple PhantomJS browser instances, each connected to a SOCKS5 proxy.

Autonomous agents for realtime multiplayer ice-hockey

Academic, , 2020

We design an automated agent to play 2-on-2 games in SuperTuxKart IceHockey. Our two-stage system composes of a "vision" stage which takes as input the image of the player's Field of View and predicts world-state attributes. For vision, we train a multi-task CenterNet model (with U-Net backend), to predict whether hockey puck was on-screen (classification), puck's x-y coordinates (aimpoint regression) and distance from player (regression). These are consumed by a "controller" stage which return actions that update the world-state by "dribbling" puck towards goal, or defending against the opposing AI team.

Asking the Right Questions: Question Paraphrasing Using Cross-Domain Abstractive Summarization and Backtranslation

Academic, , 2021

A common issue when asking questions is that they might be prone to misinterpretation: most of us have experienced when a colleague or teacher misinterprets a question and provides an answer which is tangential to the information we desire, or incomplete. This problem is exacerbated over text, where visual and emotion cues are not transmittable. We hypothesize that question answering models face the same issues as the human responder in such situations: when asked an ambiguous question, they might be unsure what to retrieve from the given passage. We propose paraphrasing the question with pre-trained language models, to improve answer retrieval and robustness to ambiguous questions. We introduce a new scoring metric, GROK, to evaluate and select good paraphrases. We show that this metric improved upon paraphrase selection via beam search for downstream tasks, and that this metric combined with data augmentation via backtranslation increases question answering performance on the NewsQA and BioASQ datasets, improving EM by 2.5% and F1 by 1.9% over-and-above the baseline on the latter.

Extending Whisper, OpenAI’s Speech-to-Text Model

Academic, , 2022

We study the performance of OpenAI's Whisper model, the state-of-the-art Speech-to-text model, in noisy urban environments. To do so, we create a dataset consisting of 134 minutes of us reading out loud in both quiet and noisy urban environments (subway, street and cafe) and manually annotating the recordings at 30 second intervals. Using a powerful multi-GPU AWS cluster and distributed computing framework Ray, we find that Whisper performs significantly worse on speeches recorded in noisy environments than on those recorded in quiet environments, in contrast to assertions made by Whisper authors. This performance gap is particularly severe for small Whisper models. This finding is concerning since the small models, due to its low inference-time, are most likely to be deployed on handheld devices (like smartphones), and thus more likely to be exposed to outside noise that can degrade speech-to-text performance. To improve performance, we fine-tune the HuggingFace Whisper implementation on a split of our collected data. We find that fine-tuning on single-speaker noisy speech improves average Word Error Rate (WER) by 2.81 (from 28.76 to 25.95) and fine-tuning on multi-speaker noisy speech improves average WER by 2.61 (from 28.76 to 26.15). Thus we are able to successfully adapt OpenAI Whisper to function reliably in noisy urban environments.

Abhishek Paul

Sitemap

Pages

Posts

portfolio

projects

publications

talks

teaching