Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. METHODOLOGICAL ADVANCES IN REAL-WORLD COMPUTER VISION: ROBUSTNESS, EFFICIENCY AND HUMAN PREFERENCE ALIGNMENT

METHODOLOGICAL ADVANCES IN REAL-WORLD COMPUTER VISION: ROBUSTNESS, EFFICIENCY AND HUMAN PREFERENCE ALIGNMENT

File(s)
Jiang_cornellgrad_0058F_15301.pdf (2.63 MB)
Permanent Link(s)
https://doi.org/10.7298/kez2-tc39
https://hdl.handle.net/1813/121027
Collections
Cornell Theses and Dissertations
Author
Jiang, Yixuan
Abstract

Computer vision, a core area of Artificial Intelligence (AI), enables machines to understand the visual world. As the emergence of deep neural networks, remarkable progress has been achieved in solving computer vision problemssuch as image classification, semantic segmentation and text-to-image generation. These advances have spurred growing interest in real-world applications, yet deploying such models introduces unaddressed challenges sue to the complex requirements of practical scenarios. In this thesis, we propose methodologies to address three key concerns in bringing computer vision models into real-world applications: inference robustness, iteration efficiency, and human-aligned evaluation. First of all, given that noise and corruption are prevalent in real-world data, it is important for practical computer vision models to be robust against input perturbations. To this end, we focus on adversarial training, a class of effective defense methods for enhancing model robustness against adversarial data. A primary challenge in adversarial training is the “robust generalization” problem, where models can be robust against seen attack during adversarial training, while remain vulnerable to unseen attacks. Focusing on a group of adversarial training methods, namely the geometry-aware methods, we observe that they always converge to a sharp minimum in the weight loss landscape. Motivated by this, we propose Geometry-Aware Weight Perturbation (GAWP), which injects weight perturbation on top of geometry-aware adversarial training to regularize the loss landscape flatness. Extensive results demonstrate that GAWP alleviates the robust generalization issue of geometry-aware methods and consistently improves robustness compared to existing weight perturbation strategies. Another bottleneck that limits the performance of adversarial training is “robust overfitting”: after a certain stage of training, model robustness onthe test dataset starts and continues degrading. Effective remedies to this issue can be categorized into two classes: confidence calibration and flat minimum methods, both of which are commonly formulated as regularized adversarial training approaches. We show that these methods can be unified under a constrained optimization framework, revealing that flat minimum approaches can be interpreted as adaptive confidence calibration. To solve this constrained optimization problem, we propose a novel adversarial training framework based on a powerful nonlinear dynamical system technique called the Quotient Gradient System (QGS). Specifically, we introduce QGS Adversarial Training (QGSAT) framework and develop two numerical methods termed the QGS method assisted by confidence calibration (QGSAT_CC ) and the QGS method assisted by a flat minimum (QGSAT_FM ). We theoretically prove that QGSAT enforces effective regularization by ensuring convergence to the feasible region, and achieves optimal regularization under convexity when the region is empty. Extensive numerical studies demonstrate that QGSAT mitigates robust overfitting and outperforms existing regularized adversarial training methods. Second, we address the problem of training efficiency in model iteration. To reduce training costs and alleviate the burden of hyperparameter tuning in SGD, we propose the Conjugate Gradient with Quadratic line-search (CGQ)method. Specifically, CGQ employs a quadratic line-search strategy to adaptively determine the step size based on the current loss landscape, while the momentum factor is dynamically updated through the conjugate gradient parameter (in the spirit of the Polak–Ribiere approach). We establish theoretical guarantees for convergence under strongly convex settings, and empirical results on image classification datasets demonstrate that CGQ achieves faster convergence and superior generalization performance compared to existing local solvers. Finally, we establish a discriminative benchmark to enable cost-efficient and human preference–aligned evaluation of text-to-image diffusion models. Unlike existing metrics for generative models, our benchmark assesses the discriminative capability of pretrained text-to-image diffusion models through semantic segmentation on a multi-domain dataset. Comparing the benchmark results with modern human-aligned metrics and survey results, we observe a strong correlation among the ranking results. We highlight three main advantages of the proposed benchmark: 1) it relies solely on features from the evaluated model, avoiding bias introduced by third-party models; 2) it eliminates the substantial cost of constructing human preference datasets and training dedicated evaluation models; and 3) it achieves strong alignment with human judgments.

Description
153 pages
Date Issued
2025-12
Keywords
adversarial training
•
diffusion model
•
nonlinear dynamical system
•
optimization
Committee Chair
Chiang, Hsiao-Dong
Committee Member
Acharya, Jayadev
Reeves, Anthony
Degree Discipline
Electrical and Computer Engineering
Degree Name
Ph. D., Electrical and Computer Engineering
Degree Level
Doctor of Philosophy
Rights
Attribution 4.0 International
Rights URI
https://creativecommons.org/licenses/by/4.0/
Type
dissertation or thesis

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance