Sitemap
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Pages
Posts
Future Blog Post
Published:
This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.
Blog Post number 4
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 3
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 2
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 1
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
portfolio
Portfolio item number 1
Short description of portfolio item number 1
Portfolio item number 2
Short description of portfolio item number 2 
projects
Music Generation Using Machine Learning
Published:
Project Goal: The goal of this project was to create a music composition application that utilizes machine learning to generate music extensions. The application allows users to play a melody on a MIDI keyboard, and the system generates a continuation of that melody for the same length of time.
Project Overview:
1) UI:
i) Users are prompted to play a melody on a MIDI keyboard.
ii) The application transforms the input melody using the same transformations applied to the training dataset.
iii) The transformed melody is fed through a trained model to generate a continuation of the played notes.
iv) The user’s recording and the model-generated continuation are saved and displayed to the user.
2) Machine Learning Pipeline:
i) Data Transformation: Developed techniques to convert music into machine-readable embeddings.
ii) Model Development: Designed and trained a Recurrent Neural Network (RNN) for music generation.
iii) Integration: Integrated the final model with a user interface and a physical MIDI keyboard.
Outcome: Our team successfully created a music composition application using neural networks. The framework we developed allows for future enhancements of neural network models for music generation. This framework is adaptable, requiring minimal code changes to use different datasets and alter model parameters for future projects.
Autonomous agents for multiplayer SuperTuxCart
Published:
Project Goal: The goal of this project was to train an AI agent to play SuperTuxKart, competing with other trained agents in a 2v2 hockey game.
Project Description:
1) Approach:
i) Reinforcement Learning: Initially, we used Q-Learning for training. However, it struggled to learn optimal strategies.
ii) Imitation Learning: To overcome this, we trained the model using imitation learning on winning strategies derived from over 2000 games.
iii) Hybrid Model: We then improved the reinforcement learning model by using the trained imitation model for initialization. This helped further refine the agent’s strategy.
2) Future Work:
i) Data Augmentation: Incorporate data augmentation during training to randomize initialization and improve robustness.
ii) Internal State Controller: Explore the design of a hand-crafted internal state controller to compare its effectiveness against AI models.
Outcome: Our project successfully demonstrated the application of a hybrid approach, combining imitation learning with reinforcement learning, to train an AI agent for SuperTuxKart. The refined agent showed improved performance in minimizing player-puck and puck-goal distances, paving the way for future enhancements and comparative studies with hand-crafted controllers. This project consisted of training an agent to play SuperTuxKart to compete with the other trained agents in a 2v2 hockey game. Our strategy was to utilize reinforcement learning to minimize the player-puck and puck-goal distance. Our initial attempt at using Q Reinforcement Learning had trouble learning optimal strategies. Imitation Learning was then used to train the model on the winning strategies of over 2000 games. Our reinforcement learning model was then improved by using the imitation model as its initialization in order to further refine the agent’s strategy. Future work can involve utilizing data augmentation during training in order to randomize initialization. A hand-crafted internal state controller could also be designed to see if it is superior to the AI models.
J.A.R.V.I.S
Published:
Chatbot that answers factual questions based on the dense telecom documentation (Ericsson and 3GPP documentation). Created a data pipeline that ingested pdfs, json and html files in order to get encodings. The chatbot consisted of two modes i) document search which used the multi-qa-mpnet-base-dot-v1 model. ii) generative which used the llama-2-7B-chat-hf model Allows Radio Engineers to quickly search up information without needing to go through the Ericsson adnd 3GPP documentation.
Rhythms of the Machine: Deep Learning in Music Creation
Published:
Project Goal: The goal of this project is to create an application that allows musicians to create extensions to their input. I created and trained an LSTM-based recurrent neural network designed to model monophonic music with expressive timing and dynamics.
Outcome: The model successfully generated fluid extensions, maintaining the melody and rhythmic structure. The framework I developed allows for future enhancements of neural network models for music generation. This adaptable framework requires minimal code changes to use different datasets and alter model parameters for future projects. The model can use any variable length of music input to generate an extension or no input to create a fresh composition. Here are a few examples generated by the model:
Input:
Generated Extensions:
The model is also able to generate compositions from scratch without any input. For the outputs below, the model had no prior musical context and had to rely solely on its learned weights and predictions to construct the compositions:
publications
Paper Title Number 1
Published in Journal 1, 2009
This paper is about the number 1. The number 2 is left for future work.
Recommended citation: Your Name, You. (2009). "Paper Title Number 1." Journal 1. 1(1). http://academicpages.github.io/files/paper1.pdf
Paper Title Number 2
Published in Journal 1, 2010
This paper is about the number 2. The number 3 is left for future work.
Recommended citation: Your Name, You. (2010). "Paper Title Number 2." Journal 1. 1(2). http://academicpages.github.io/files/paper2.pdf
Paper Title Number 3
Published in Journal 1, 2015
This paper is about the number 3. The number 4 is left for future work.
Recommended citation: Your Name, You. (2015). "Paper Title Number 3." Journal 1. 1(3). http://academicpages.github.io/files/paper3.pdf
Paper Title Number 4
Published in GitHub Journal of Bugs, 2024
This paper is about fixing template issue #693.
Recommended citation: Your Name, You. (2024). "Paper Title Number 3." GitHub Journal of Bugs. 1(3). http://academicpages.github.io/files/paper3.pdf
talks
Talk 1 on Relevant Topic in Your Field
Published:
This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!
Conference Proceeding talk 3 on Relevant Topic in Your Field
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
teaching
Teaching experience 1
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Teaching experience 2
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.
SearchDistribute: webscraping search results on an academic budget
Academic, , 2017
retrieves upto 250,000 Search engine results per day. With a $5/month VPN subscription, can extract upto 10,000+ search results per query per hour (120x cheaper than Google Search API). Built using Python and Selenium, coordinates multiple PhantomJS browser instances, each connected to a SOCKS5 proxy.
Autonomous agents for realtime multiplayer ice-hockey
Academic, , 2020
We design an automated agent to play 2-on-2 games in SuperTuxKart IceHockey. Our two-stage system composes of a "vision" stage which takes as input the image of the player's Field of View and predicts world-state attributes. For vision, we train a multi-task CenterNet model (with U-Net backend), to predict whether hockey puck was on-screen (classification), puck's x-y coordinates (aimpoint regression) and distance from player (regression). These are consumed by a "controller" stage which return actions that update the world-state by "dribbling" puck towards goal, or defending against the opposing AI team.
Asking the Right Questions: Question Paraphrasing Using Cross-Domain Abstractive Summarization and Backtranslation
Academic, , 2021
A common issue when asking questions is that they might be prone to misinterpretation: most of us have experienced when a colleague or teacher misinterprets a question and provides an answer which is tangential to the information we desire, or incomplete. This problem is exacerbated over text, where visual and emotion cues are not transmittable. We hypothesize that question answering models face the same issues as the human responder in such situations: when asked an ambiguous question, they might be unsure what to retrieve from the given passage. We propose paraphrasing the question with pre-trained language models, to improve answer retrieval and robustness to ambiguous questions. We introduce a new scoring metric, GROK, to evaluate and select good paraphrases. We show that this metric improved upon paraphrase selection via beam search for downstream tasks, and that this metric combined with data augmentation via backtranslation increases question answering performance on the NewsQA and BioASQ datasets, improving EM by 2.5% and F1 by 1.9% over-and-above the baseline on the latter.
Extending Whisper, OpenAI’s Speech-to-Text Model
Academic, , 2022
We study the performance of OpenAI's Whisper model, the state-of-the-art Speech-to-text model, in noisy urban environments. To do so, we create a dataset consisting of 134 minutes of us reading out loud in both quiet and noisy urban environments (subway, street and cafe) and manually annotating the recordings at 30 second intervals. Using a powerful multi-GPU AWS cluster and distributed computing framework Ray, we find that Whisper performs significantly worse on speeches recorded in noisy environments than on those recorded in quiet environments, in contrast to assertions made by Whisper authors. This performance gap is particularly severe for small Whisper models. This finding is concerning since the small models, due to its low inference-time, are most likely to be deployed on handheld devices (like smartphones), and thus more likely to be exposed to outside noise that can degrade speech-to-text performance. To improve performance, we fine-tune the HuggingFace Whisper implementation on a split of our collected data. We find that fine-tuning on single-speaker noisy speech improves average Word Error Rate (WER) by 2.81 (from 28.76 to 25.95) and fine-tuning on multi-speaker noisy speech improves average WER by 2.61 (from 28.76 to 26.15). Thus we are able to successfully adapt OpenAI Whisper to function reliably in noisy urban environments.
