M.Sc. by Research at University of Edinburgh

This is Bita Asoodeh, an M.Sc. by Research student supervised by Prof. Antonio Barbalace and Prof. Amir Shaikhha, with the Institute for Computing Systems Architecture (ICSA) in the School of Informatics at The University of Edinburgh, UK (2025 - current). My research focuses on Database management systems and Compilers (JIT compilation).

Education

M.Sc. by Research, Computer Science, 2025 - current
University of Edinburgh, Edinburgh, UK
Supervisors: Prof. Antonio Barbalace, Prof. Amir Shaikhha

B.Sc., Computer Science
University of Isfahan (UI), Isfahan, Iran
Graduation date: July 2020

Publications

VLDB 2026 Under Revision

Unbiased Binning: Fairness-aware Attribute Representation

Discretizing raw features into bucketized attribute representations is a popular step before sharing a dataset. It is, however, evident that this step can cause significant bias in data and amplify unfairness in downstream tasks. In this paper, we address this issue by introducing the unbiased binning problem that, given an attribute to bucketize, finds its closest discretization to equal-size binning that satisfies group parity across different buckets. Defining a small set of boundary candidates, we prove that unbiased binning must select its boundaries from this set. We then develop an efficient dynamic programming algorithm on top of the boundary candidates to solve the unbiased binning problem. Finding an unbiased binning may sometimes result in a high price of fairness, or it may not even exist, especially when group values follow different distributions. Considering that a small bias in the group ratios may be tolerable in such settings, we introduce the ε-biased binning problem that bounds the group disparities across buckets to a small value ε. We first develop a dynamic programming solution DP that finds the optimal binning in quadratic time. The DP algorithm, while polynomial, does not scale to very large settings. Therefore, we propose a practically scalable algorithm, based on local search (LS), for ε-biased binning. The key component of the LS algorithm is a divide-and-conquer (D&C) algorithm that finds a near-optimal solution for the problem in near-linear time. We prove that D&C finds a valid solution for the problem unless none exists. The LS algorithm then initiates a local search, using the D&C solution as the upper bound, to find the optimal solution.

VLDB 2026 Under Review

A Portable Middleware for Plan-Based Adaptive Query Processing

Inaccurate cardinality estimation is a major performance bottleneck for many database management systems (DBMS). Plan-based adaptive query processing (AQP) has been widely studied to address this issue, but existing solutions are tightly coupled to specific database engines, making them difficult to reuse or extend. This paper presents AQPHub, an extensible framework for plan-based AQP that decouples AQP strategies from specific engines. This is achieved by providing a middleware that leverages a SQL-based interface, requiring minimal effort to port to a new DBMS. Users can easily extend AQPHub by integrating different DBMSs and/or designing different AQP strategies. AQPHub shows negligible overhead while preserving the benefits of the state-of-the-art plan-based AQP methods. Our demonstration scenarios enable attendees to interact through a web interface and explore five database systems on four different AQP strategies and integration levels.

EuroMLSys 2026 Accepted

Towards a Solution to the Management Scaling Paradox in Distributed LLM Inference

In disaggregated LLM serving, prefix caching avoids redundant prefill by reusing KV cache across nodes, but existing systems manage this cache entirely in user space—atop the OS kernel that already manages memory, page tables, and RDMA registration. Our instrumented profiling of the state-of-the-art LMCache + Mooncake stack reveals a management scaling paradox: user-space overhead from coordination RPCs, redundant memory copies, and transfer fragmentation grows to 79% of achievable time-to-first-token (TTFT) as the cache warms, nearly doubling latency when caching should help most. We present RMC (Remote Memory for Cache), an in-kernel framework that extends OS-provided shared memory—already used locally via /dev/shm in systems like vLLM—to cluster-wide scope. RMC exploits two KV cache properties—fixed per-token size and content-determined identity—to provide a content-addressed Partitioned Global Address Space where any node computes the address of any cached chunk without coordination. Across four representative workloads, RMC achieves 1.1–1.3× TTFT reduction over LMCache + Mooncake and up to 2.1× over full recompute.

Teaching Assistantship

Programming I, Fall 2017.
Software Engineering (UML), Spring 2020.
Operating System, Winter 2026.

Professional Service

Volunteer, Conference, Edinburgh, UK, April 27th—30th

Work Experience

lecturer: C programming language Lab: Spring 2023
lecturer: C++ programming language Lab: Fall 2023

Honors & Awards

MSc by Research, Scholarship Award — School of Informatics, University of Edinburgh, 2025

Selected Projects

PG-Schema Parser

This project aims to create a schema parser tailored for graph query languages, with a focus on the PG-Schema used by the Linked Data Benchmark Council (LDBC) dataset. The parser will be built using Scala's fast parse combinator library, which will allow for efficient and flexible parsing of schema text into structured objects. The main goal is to translate PG-Schema definitions directly into the instance of the CyphSchema class found in LDBC.scala, simplifying data handling and manipulation. Additionally, this parser aims to be reusable, making it adaptable to a variety of graph databases and schemas beyond just LDBC.

Website Topic Discovery

The goal of this project was recognition of the subject of Persian websites. Considering python for implementation, we utilize web-scraping packages such as selenium to crawl the websites from a pool of URLs. For each of the crawled websites, we then apply preparations including stop-word removal, stemming, and toknization to extract collection keywords and sentences. After associating each keyword with a TF/IDF score, we use word embedding and NLP packages and develop a classifier for detecting the website topic. Collecting labeled training data is a major challenge in this project, for which we consider unsupervised techniques and clustering algorithms such as Fuzzy C-Means and SVM over a pool of unlabeled documents, and use human intelligence for finalizing the labels.

Machine Learning & Data Cleaning

Conducted proper data cleanings and implemented machine learning class projects using different data sets, including "Character Font Images" (archive.ics.uci.edu/ml/datasets/Character+Font+Images) and "Cat/Dog classification" (www.kaggle.com/c/dogs-vs-cats).

Kaggle Kernels on GPU

Deployed ML programs such as www.kaggle.com/lucassi/dogs-vs-cats-train-validation-and-evaluation on GPU, using CUDA.

Socket Programming

Implementation of client-server chat system using socket-programming by python.

C++ Games

Implemented simple games, including "SNAKE" and "TETRIS" with C++.

Design Project

Conducted analysis and design of a hotel management system with UML, UI, and UX.

Skills

Programming Languages

C++, C, Python, Scala

DBMS Internals & Systems

PostgreSQL (source-level research fork)
Multi-engine setup & deployment (DuckDB, PostgreSQL, Umbra, MariaDB, OpenGauss)
JIT compilation pipelines

AI & Data Science

Numpy, Pandas, cv2, Scikit-learn, Tensorflow, CUDA

Modeling & Design

UML (Visual Paradigm), UI/UX (Adobe Illustrator)

Writing Tools

LaTeX