Professional - Aaron B. Brown

Aaron B. Brown, PhD

Senior Director, Product Management
Google, Inc
345 Spear St.
San Francisco, CA 94105

Twitter: @aaron_b_brown

Aaron Brown photo I am a Senior Director of Product Management at Google, where I've worked on a range of consumer and enterprise products from search to healthcare to developer tools and infrastructure.

Prior to joining Google in March 2010, I managed advanced client engagement for content analytic, text analytic, and information access solutions for IBM's Information Management, Analytics and Optimization software services organization.

Previously, I ran strategy and product marketing for the Discovery, Analytics, and Search product group within IBM Enterprise Content Management software. Prior to joining the IBM Information Management Software team, I was a member of the IBM Software Group Strategic Alliance Development organization. And before that, I was a researcher at IBM's T.J. Watson Research Center, where I worked on new ways to make big computing systems more self-reliant and easier to administer, and helped develop aspects of the new field of services research.

For more detail on my professional background, see my LinkedIn page.

Research Work

Publications

Please see my publications page for a list of my recent publications.

Work at IBM Research

At IBM Research, I was a member of the Adaptive Systems Department at the IBM T.J. Watson Research Center in Hawthorne, NY. My research at IBM focused on measuring, understanding, and reducing the burdens faced by the human administrators of today's large enterprise IT systems.

My work focused in particular on the configuration complexity that permeates large IT sysems, and on metrics to quantify it and benchmarks to evaluate it. I was also involved in IBM's Autonomic Computing initiative, where I contributed to the architecture for Autonomic Computing, designed benchmarks for Autonomic Computing capabilities, developed automated change management technology, and investigated analysis methods for human-driven IT processes.

My general research interests span a broad array of subdisciplines of computer science systems research, including system management, benchmarking, dependability, operating systems, storage, system architecture, clustered and distributed systems, network-delivered services, and the system-operator interface. I am particularly intrigued by the interactions between large-scale systems and the humans that maintain them and how those interactions are affected by computer system architecture.

I am also one of IBM Research's two Campus Relationship Managers for Harvard University. If you are a Harvard student or professor in a computing- or physical-science-related field looking for information about IBM Research or connections to our research community, please feel free to contact me.

Work prior to joining IBM Research

Before joining IBM, I was a PhD student in the EECS Computer Science Division at the University of California, Berkeley. There, I was a founding member of the Recovery-Oriented Computing (ROC) Project with Professor David Patterson. The ROC project investigated novel techniques for building highly-dependable Internet services, with an unconventional philosophy that accepts failures as inevitable and hence emphasizes recovery from failures rather than traditional failure-avoidance. See the ROC project page for more information on the project.

My research in the ROC project focused on addressing failures caused by human interaction with large server systems. Human (operator) error is the single largest cause of failures in a wide range of server systems, and most fault-tolerant systems and research simply ignore the negative and positive human contributions to system dependability. I developed and evaluating a model, framework, and implementation of a system-level undo/redo facility for human operators. Such a facility provides a natural way to recover from human errors while also allowing operators to safely use trial-and-error experimentation when diagnosing and repairing systems. It further comprises a last-resort recovery tool capable of resolving a broad class of problems (including human, configuration, and software errors) even when the nature of the system corruption is unknown. System-level undo/redo works by marrying physical roll-back of all levels of system state with external tracking and replay of user/service interactions; it allows arbitrary repairs to be performed between roll-back and roll-forward, and automatically synthesizes an internally- and externally-consistent post-roll-forward state that integrates the repairs into the original history of user actions.

To help evaluate this work, I also developed human-aware dependability benchmarking methodologies that provide a way to measure progress in addressing human error. These benchmarks work by including the human operator as a component of the system, subjected to a workload of typical maintenance tasks, and evaluated indirectly via their effect on the system's performance and availability.

Both the undo/redo system and the human-aware dependability benchmark methodology are described in depth in my PhD dissertation.

Before the ROC project, I worked on the UC Berkeley ISTORE project, which explored the integration of general-purpose computation into network-attached disks ganged into a large storage array; on the UC Berkeley IRAM project, which explored the potential benefits and consequences of integrating large quantities of DRAM directly onto the processor chip and developed vector-based processor architectures to suit that integration; on the Harvard VINO Operating System project; and on a methodology and benchmark suite for detailed application-aware operating system performance analysis (described in my undergraduate thesis).

Last updated: 24-Aug-2022