Scott Farrar, Ph.D.
Mountain View, CA 94040
OBJECTIVE: Apply my research and engineering skills to the design and construction of intelligent systems in a product-focused applied research role. Work Eligibility: US Citizen.
RELEVANT SKILLS:
Machine Learning, Data Mining, Pattern Recognition, Scientific Computation, Experimental Design and Analysis.
Python, Java, Pig, C, R, SQL, Protocol Buffers (Protobuf), JSON, XML, HTML, JavaScript, PostScript.
Linux, Hadoop, Map-Reduce, TreeNet, Decision Trees, Neural Networks.
WORK EXPERIENCE:
Distributed Systems Engineer. Apple, Inc. Cupertino, CA. 2011-present.
Architected Hadoop/Pig-based analytical solutions for wireless device data. Built data analysis processing pipeline, including extraction from binary device format (Protobuf), data collation and metadata annotation (JSON), data cube aggregation (Hadoop, Pig), and schema generation for use by relational database / front end UI. Evangelized a common, automated processing approach to diverse datasets, allowing team to meet aggressive deadlines on time and in budget.
Senior Research Engineer. Yahoo!, Inc. (formerly Inktomi). Sunnyvale, CA. 2003-2010.
Senior member of the Machine Learned Ranking (MLR) Team in the Applied Sciences group at Yahoo Search. Toolbar Feature Project Lead: Pioneered the first application of Yahoo Toolbar User Behavior Data to the problem of ranking web search results. Project facets included: Yahoo Toolbar data log exploration, session segmentation, feature conception, extraction of relevant statistics, MLR model training and analysis, productionization including disk/memory budgeting and feature compression (TreeNet, GBDT). Performed experimental design for numerous relevance studies, created automated tools for measuring Discounted Cumulative Gain (DCG), statistical significance testing, comparative feature distribution visualization, result set data drill-down, etc. (Python, Java, R, HTML, XML). Exploratory data mining and statistics gathering on Yahoo's massively parallel cloud computing infrastructure (Hadoop, Map-Reduce, Python).
Project lead for News Vertical Search feature research. Proposed and analyzed novel user behavior-based feature types for ranking articles in Yahoo News: user click behavior on News Search results, plus Toolbar user behavior. Presented results and recommendations to internal Sciences group. Systematic study of region/language features for US-English MLR function, resulting in significant relevance improvement and simplified search engine configuration. Project management/release troubleshooting for multiple MLR algorithm deployments.
Team lead for Paid Inclusion (PI) Relevance Team. Proposed, developed, implemented, and deployed the first MLR algorithms for PI ranking (PIMLR) at Yahoo. Immediate relevance and revenue (> $3M/year) improvements; enabled PI to benefit from future long-term relevance gains via MLR methodology. End-to-end ownership of project: initial problem formulation, training set gathering, experimentation and modeling, active learning, extensive offline and online relevance testing, measurement of user click behavior, infrastructural modifications to the search engine, deployment, troubleshooting, documentation and training.
Pioneered rigorous web-PI content blending methodology based on Dilution Testing and parameter optimization. Allowed first systematic understanding of interaction between PI and other web content during blending, greatly aiding PI business decisions. Evangelized Dilution Testing as a general means for measuring Search Engine behavior, performance, and correctness. Proposed and evangelized new relevance metric (Rank Weighted Average) for Sponsored Search.
Software Engineer. QED Labs. San Jose, CA. 2002-2003.
Team member for a component-based software package to perform image analysis and 3D reconstruction of viruses from electron microscope photographs. Designed and implemented a system for dynamic application package activation. Proposed an object persistence layer using CORBA-based C++ introspection and a relational database backend (PostgreSQL). Refactored software to improve correctness and efficiency.
Data Mining Scientist. Digital Impact, Inc. San Mateo, CA. 1999-2001.
Lead developer of data mining and knowledge discovery technology. Designed and implemented high throughput database applications for dynamic targeting in electronic commerce (Java, Oracle 8i, SQL, PL/SQL, Unix). Technical lead for all project stages: initial client consultation, requirements gathering, technical design, project plan, engineering, QA testing, performance tuning, rollout to production environment, documentation of technology and process flow, and user training. Point person for architectural, design, feasibility, and data warehouse scalability questions.
Designed and implemented a content annotation and storage system for holding targeting metadata and hypertext content (XML, Oracle 8i). Analyzed performance of real-time production system database using SQL Trace. Aggressively optimized application queries using index hints, statistics, and application caching (PL/SQL, Java). Generated reports from ad hoc queries detailing important trends in client data sets. Presented algorithms and technology to both technical and non-technical audiences. Evaluated third-party technologies to answer buy-versus-build questions (Delano). Mentored junior engineers in various projects, instructing them in effective software engineering techniques.
OTHER WORK EXPERIENCE:
Research Assistant, Department of Cognitive Science. University of California, San Diego. La Jolla, CA. 1992-1999.
Investigated biologically plausible models of information processing in the sensorimotor cortex using machine learning techniques. Simulated multiple neural network architectures to analyze and solve bilateral coordination problems. Built robot motion planners using dynamic programming and gradient decent techniques to identify optimal movement trajectories. Optimized feedforward and recurrent neural networks to emulate the behavior of motion planners. Extensively analyzed neural network internal representations with custom visualization tools, and compared these to published experimental data. Published results in the journal Biological Cybernetics. Constructed all algorithms, simulations, and analytical and visualization tools from scratch (C++, Linux, Windows NT, PostScript).
Teaching Assistant, Department of Cognitive Science. University of California, San Diego. La Jolla, CA. 1992-1999.
Course Topics: Artificial Intelligence, Neurological Development, Cognitive Neuroscience: Functional Neurobiology and System Neurobiology, Modeling Cognitive Phenomena, Introduction to Computing.
Research Intern. Teleos Research. Palo Alto, CA. 1994.
Proposed and constructed an artificial sound localization system inspired by that of the barn owl. Assisted in the construction of an in-house visual tracking system.
Research Intern. Fuji-Xerox Palo Alto Laboratory. Palo Alto, CA. 1993.
Applied a Hidden Markov Model part-of-speech text-tagger to extract syntactic content from document images without optical character recognition.
Engineering Intern. Apple Computer. Cupertino, CA. 1992.
Contributed to the database design and testing of an experimental educational application.
Research Intern. Applied Materials Japan. Narita, Japan. 1991.
Designed a research project to study the properties of thin titanium films on silicon and silicon dioxide wafers using collimated Physical Vapor Deposition. Performed clean-room experiments using electron and optical microscopy and presented results detailing recommended process conditions for customer use.
Programmer. Stanford Linear Accelerator Center. Stanford, CA. 1990.
Implemented a user interface to display particle detector readings as part of a programming team.
EDUCATION, HONORS, MEMBERSHIPS:
Ph.D. University of California, San Diego. Cognitive Science. 1999.
Dissertation: Neural Network models of the brain mechanisms of bilateral coordination.
Ford Foundation Predoctoral Fellowship
M.S. University of California, San Diego. Cognitive Science. 1994.
B.S. Stanford University. Computer Science. 1992.
Association for Computing Machinery (ACM)
IEEE Computer Society
SELECTED PAPERS:
Farrar, DS and Zipser, D. (1999) Neural Network models of bilateral coordination. Biological Cybernetics. 80(3):215-225.
Sibun, P and Farrar, DS. (1994) Content characterization using word shape tokens. Proceedings of the 15th International Conference on Computational Linguistics. 686-690.