Publications

A searchable list of some of my publications is below. You can also access my publications from the following sites.

My ORCID is ORCID iD icon

https://orcid.org/0000-0002-6236-2969

Publications:

Show all

Gong Zhang, Kihyuk Sohn, Meera Hahn, Humphrey Shi, Irfan Essa

FineStyle: Fine-grained Controllable Style Personalization for Text-to-image Models Proceedings Article

In: Advances in Neural Information Processing Systems (NeurIPS), 2024.

Abstract | Links | BibTeX | Tags: computer vision, generative AI, generative media, machine learning, NeurIPS

Dina Bashkirova, José Lezama, Kihyuk Sohn, Kate Saenko, Irfan Essa

MaskSketch: Unpaired Structure-guided Masked Image Generation Proceedings Article

In: IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2023.

Abstract | Links | BibTeX | Tags: computer vision, CVPR, generative AI, generative media, google

@inproceedings{2023-Bashkirova-MUSMIG,

title = {MaskSketch: Unpaired Structure-guided Masked Image Generation},

author = { Dina Bashkirova and José Lezama and Kihyuk Sohn and Kate Saenko and Irfan Essa},

url = {https://arxiv.org/abs/2302.05496

https://openaccess.thecvf.com/content/CVPR2023/papers/Bashkirova_MaskSketch_Unpaired_Structure-Guided_Masked_Image_Generation_CVPR_2023_paper.pdf

https://openaccess.thecvf.com/content/CVPR2023/supplemental/Bashkirova_MaskSketch_Unpaired_Structure-Guided_CVPR_2023_supplemental.pdf},

doi = {10.48550/ARXIV.2302.05496},

year  = {2023},

date = {2023-06-01},

urldate = {2023-06-01},

booktitle = {IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR)},

abstract = {Recent conditional image generation methods produce images of remarkable diversity, fidelity and realism. However, the majority of these methods allow conditioning only on labels or text prompts, which limits their level of control over the generation result. In this paper, we introduce MaskSketch, an image generation method that allows spatial conditioning of the generation result using a guiding sketch as an extra conditioning signal during sampling. MaskSketch utilizes a pre-trained masked generative transformer, requiring no model training or paired supervision, and works with input sketches of different levels of abstraction. We show that intermediate self-attention maps of a masked generative transformer encode important structural information of the input image, such as scene layout and object shape, and we propose a novel sampling method based on this observation to enable structure-guided generation. Our results show that MaskSketch achieves high image realism and fidelity to the guiding structure. Evaluated on standard benchmark datasets, MaskSketch outperforms state-of-the-art methods for sketch-to-image translation, as well as unpaired image-to-image translation approaches.},

keywords = {computer vision, CVPR, generative AI, generative media, google},

pubstate = {published},

tppubtype = {inproceedings}

}