Luca Cazzanti’s Projects
AI, ML, Data Science for Personalization
Personalization refers to tailoring product offers, website content, user experiences to meet the individual preferences, behaviors, and needs of each consumer. AI and machine learning can help businesses scale and broaden their personalization efforts to very large scale. I started as an individual contributor developing core reinforcement learning algorithms for the mobile phone vertical, and grew into leadership positions supporting multi-functional teams across core ML, data science, data engineering and MLOps. I’ve done this in many verticals, spanning mobile phone providers (Globys and Amplero), real estate (Zillow), brick-and-mortar retail (Amplero), e-commerce (Zulily), and FinTech (Curinos, which acquired Amplero). In my career, contextual multi-armed bandits have been the primary reinforcement learning algorithm powering my personalization work, with more traditional approaches like propensity modeling and clustering playing a suppporting role. More recently, my teams are incorporating deep embeddings, generative AI, and retrieval-augmented generation (RAG) into reinforcement learning. For more information, see a blog post on evaluating the performance of multi-armed bandits from my time at Zillow, co-authored with the brilliant then-intern Zhihan Xhiong, and my publications (Image credit: Zillow)
Soccer Data Science, Analytics, and Coaching
In my spare time, I bring together my passion for the game of soccer with my data science expertise. I coach soccer, take on freelance soccer data science projects, volunteer in the soccer community, and advise soccer organizations seeking to innovate their operations with data-informed practices from the pitch to the front office. Examples:
- Soccer Data Scientist: Qatar 2022 Men’s WC, volunteer soccer data scientist for Morocco National Team. Delivered data pipelines for merging event and tracking data, produced passing network charts and heatmaps. Published a Python library to easily read and plot TRACAB heatmaps. Developed xG model for USL2 2023 National Champions Ballard F.C.
- Soccer Coach: USSF D and USC National Diploma. Head coach and overall soccer program coordinator at Whitman Middle School, Seattle, girls and boys teams (2022-present); Assistant coach, Ingraham High School, Seattle (girls JVC head coach 2024); boys U10 coach with Ballard Youth Soccer Club (2018-2019)
- Sports Analytics Career Mentor: interviewed and provided feedback to students in the Sports Analytics mock-interview program at Syracuse University.
- Volunteer: Board member for Emerald City F.C. a non-profit select soccer club in Seattle; serving as secretary, IT support and VEO cameras coordinator.
- Soccer Player: I’ve never played at the elite level, but after growing up with “street soccer” in Italy I’m still at it recreationally in the Greater Seattle Soccer League (GSSL).
Maritime Situational Awareness
Computational Maritime Situational Awareness (MSA) supports the maritime industry, governments, and international organizations with machine learning and data mining techniques for analyzing vessel traffic data available through the Automatic Identification System (AIS). A critical challenge is scaling computational MSA to large data sets. My colleagues and I at the CMRE addressed this challenge. For that work I prototyped a big data analytics plaftform for maritime intelligence, enabling maritime vessel traffic characterization and anomaly detection. I developed the data processing pipelines in Python, transforming raw AIS data from SQL databases into summary statistics and designed the interactive Tableau dashboards in collaboration with the end users at the NATO Maritime Command (MARCOM). I then supervised junior scientists to scale my prototype to billions of data points by adopting parallel and distributed computing approaches. In 2020, this work and its extensions were recognized with the NATO Science and Technology Organization Excellence Award, the highest scientific achievent award given by the Science and Technology Office (STo) of the North Atlantic Treaty Organization. Collaborators: Maritime security researchers at the CMRE; Sponsors: NATO ACT
Generative Similarity-based Classification
For my Ph.D. I developed statistical learning architectures for inference based on general pairwise similarities between between complex, heterogeneous objects. These similarities are not necessarily metrics, so they do not fit nicely in standard N-dimensional spaces. To address this problem, I developed similarity discriminant analysis (SDA), a generative framework for similarity-based classification, and investigated the performance of specific SDA implementations. Local SDA provided the best bias-variance tradeoff. This work benefited from the input of the faculty and fellow students at the University of Washington - Seattle. A special thanks to my advisor Prof. Maya Gupta. Grazie!
Matlab Toolbox for Similarity Discriminant Analysis (SDA)
A set of Matlab scripts for similarity discriminant analysis (SDA), including the standard SDA, local SDA, regularized local SDA, mixture SDA, and nnSDA classfiers. This is research-grade code, designed to test ideas and concepts. I have emphasized readability of the source code rather than speed and memory management. It comes with no guarantees, but I hope you will nonetheless find it useful. As examples of how to run the software, I have included the scripts I used to run the algorithms on benchmark datasets. This software has benefitted from other people’s helpful suggestions and bug-squashing skills. I want to thank in particular Prof. Maya Gupta of the Dept. EE, University of Washington who was the original force behind the maximum entropy-based aproach to estimating similarity distributions. For the theory of the SDA framework for similarity-based classification, see the publications.
Download
- Aug 9, 2011 - sda_20110809.zip: Includes all the previous features, plus multi-task regularized local SDA, and a Matlab implementation of pairwise local SDA.
- Mar. 08, 2010 - sda_20100308.zip: Includes mex files for 32-bit Linux (10 times faster than plain m-scripts), a README to get you started quickly, and a sample data set.
- Nov. 13, 2009 - sda_20091113.zip: Includes the regularized local SDA classifier, which is the state-of-the-art of SDA-type classifiers. Also includes code for local BDA.
- Jan. 29, 2009 - sda_20090129.zip: First release.
BibTex - @MISC{CazzantiSdaToolbox2011, author = "L. Cazzanti", year = "2011", month = "August" title = "Similarity Discriminant Analysis Toolbox", url = "http://www.lucacazzanti.net/blog/", institution = "Applied Physics Laboratory - University of Washington, Seattle"}
Machine Learning to Assess Computational Protein Designs
While working on my Ph.D. I had the opportunity to work Prof. David Baker on computational protein structure design. I used machine learning to rank the goodness of the candidate protein designs produced by his lab’s software, at the time called Rosetta. Of the many possible protein structure candidate predictions, typically only a few turn out to be viable of further investiagtion, and biochemistry experts face the time-consuming task of having to assess each prediction manually. With machine learning algorithms one could streamline the assessment process by automatically pre-screening the candidate structures and filter out the least likely ones. I worked on this short project in 2005; in 2024 Prof. David Baker was awared the Nobel Prize in chemistry for his continued work in computational protein structure design. Collaborators: Prof. David Baker, Prof. Maya Gupta; Sponsors: ONR
Robust Undewater Acoustic Communications
The underwater channel severely distorts acoustic communications waveforms in both time and frequency, corrupting the received data. Interference from simultaneous transmitters and extraneous sources poses further challenges. For this project, I design and test simulations that realistically model the transmission of acoustic communication waveforms in the underwater channel using the Sonar Simulation Toolkit (SST). In particular, I am working closely with BAE Systems to characterize the performance of a particular type of FSK modulation called differential frequency hopping (DFH) in the underwater channel. We conduct preliminary simulation studies, design algorithmic improvements based on these simulations, and verify the performance of DFH on real data collected during sea trials. By iterating this process we’ve been able to make DFH even more robust to the environmental effects and to multi-user interference. Collaborators: BAE Systems; Sponsors: Office of Naval Research (ONR)
Insurgent Rhetoric Analysis with Natural Language Processing
I collaborated on this multi-disciplinary project that combined text mining and machine learning algorithms with political science expert knowledge. We mappped the rhetorical constructs used by insurgent groups in areas of world conflict to an abstract ideological space and assessed the groups’ propensities toward certain policies and actions. We researched ways to fuse data about the social interactions between the groups with rhetorical data such as press releases, speech transcripts, and web sites. We hypothesized that fusing these heterogenous sources of data increases the robustness of our predictions and allows a richer understanding of how inter-group dynamics influcence their rethoric. We were also interested in developing automated methods to track the evolution over time of the ideological propensities of each group, which enables one to monitor, and perhaps predict, the political landscape in areas of political and social unrest. Collaborators: Dr. Mike Gabbay (PI); Sponsors: Office of Naval Research (ONR)
Radioactive Isotope Identification with Information-theoretic Spectral Similarity
For this project, I demonstrated a method to identify radioactive isotopes with low-complexity pattern recognition algorithms based on spectral similarity and information theory. Radioisotope spectra measured in realistic operating conditions, which include interference from naturally-occurring radiation, signal-attenuating shielding, and backscatter, could be identified by comparison to pristine simulated spectra. The developed method was robust across different types of environment and shielding, and applicable to different portable radiation detectors. The need for robust, portable, and easy to use radioisotope identification devices is increasing, driven by the competing demands for improved security and expedited commerce. In this context, the results of this research take on particular significance. Collaborators: Dr. Lane Owsley (PI), Dr. Jack McLaughlin; Sponsors: DITRA, Dept. of Energy.
Tracking Torpedoes in the Arctic
APL is the U.S. Navy’s premiere provider of logistics for arctic exercises, including tracking and recovering test torpedoes. In 2009 APL began transitioning its tracking system from an analog to a digital platform. As part of that transition, I used digital signal processing to design, implement, and test a timing pulse recovery algorithm to provide accurate timing offset measurements to the new digital tracker. I also worked closely with hardware engineers to transition the software to the final deployment hardware platform. The algorithm worked flawlessly during 2009 U.S. Navy Arctic exercises. Because of this success, the digital system is now the primary tracker for Arctic exercises. Collaborators: Ocean Engineering Dept., Applied Physics Laboratory; Sponsors: U.S. Navy
Spectral Analysis of Saxophone Playing Styles
Jazz and classical playing styles impress different characters to the notes played on a saxophone: Jazz sounds rougher, classical cleaner. I used signal processing, audio spectral analysis and machine learning to support saxophonist Vanessa Hasbrook in her investigation of the determinants of jazz and classical sound. Collaborators: Dr. Vanessa Hasbrook, Prof. Maya Gupta; Sponsors: self-funded
Automatic Recognition of Music from Acoustic Waveform Fingerprinting
Cantametrix was a pioneer in music identification and categorization in the early days of downloadable music files, when metadata attached to MP3s was often incorrect, inconsistent, or purposefully obfuscated. My colleagues and I developed a pattern recognition system that identified digital music files directly from the acoustic waveforms without relying on metadata. I specifically worked on the digital signal processing algorithms that processed music files and extracted perceptually-relevant features that uniquely identified each song. The system correctly identified songs from a million-song database (a large number at the time!) independently of codec and encoding quality. The time-to-identification, comprising feature computation and database lookup, was a fraction of a second. Gracenote/Sony acquired Cantametrix and incorporated the technology into their ubiquitous MusicID (c) platform. The “Cantametrix fingerprint” lived on in popular consumer products such as Apple iTunes and WinAmp. An article in the Seattle Weekly describes that work.
Cell Phone Fraud Detection with Radio Frequency Fingerprinting
In the early days of mobile phones, the communication protocols were analog and unencrypted. Fraudsters would intercept the unencrypted analog wavforms and clone an innocent person’s phone number into their own. This let them to make phone calls for free, while leaving the unfortunate subscriber with a huge bill. At Cellular Technical Services I worked on a pattern recognition system based on fuzzy logic and on radio-frequency fingerprinting that identified cloned phones and block them from making calls.
Blind Demodulation and Automatic Classification of Unknown Radio Signals
Blind demodulation techniques, that is techniques that do not rely on a priori knowledge of signal parameters, are combined with modulation classification methods to automatically demodulate unknown signals. Modulation classification is achieved by automatic classifiers operating on features extracted from partially demodulated signals. We developed the Matlab Blind Demodulation Toolbox and demonstrated a prototype of the entire signal processing chain that correctly classified PSK, QAM, and FSK modulations. Collaborators: Dr. Keith Davidson, Dr. Derek Stanford, and Dr. Jill Goldschneider (Insightful Corp.); Dr. Jim Pitton (APL-UW); Sponsors: ARL, AFRL.