A searchable list of some of my publications is below. You can also access my publications from the following sites.
My ORCID is
Publications:
Yi-Hao Peng, Peggy Chi, Anjuli Kannan, Meredith Morris, Irfan Essa
Slide Gestalt: Automatic Structure Extraction in Slide Decks for Non-Visual Access Proceedings Article
In: ACM Symposium on User Interface Software and Technology (UIST), 2023.
Abstract | Links | BibTeX | Tags: accessibility, CHI, google, human-computer interaction
@inproceedings{2023-Peng-SGASESDNA,
title = {Slide Gestalt: Automatic Structure Extraction in Slide Decks for Non-Visual Access},
author = {Yi-Hao Peng and Peggy Chi and Anjuli Kannan and Meredith Morris and Irfan Essa},
url = {https://research.google/pubs/pub52182/
https://dl.acm.org/doi/fullHtml/10.1145/3544548.3580921
https://doi.org/10.1145/3544548.3580921
https://www.youtube.com/watch?v=pK08aMRx4qo},
year = {2023},
date = {2023-04-23},
urldate = {2023-04-23},
booktitle = {ACM Symposium on User Interface Software and Technology (UIST)},
abstract = {Presentation slides commonly use visual patterns for structural navigation, such as titles, dividers, and build slides. However, screen readers do not capture such intention, making it time-consuming and less accessible for blind and visually impaired (BVI) users to linearly consume slides with repeated content. We present Slide Gestalt, an automatic approach that identifies the hierarchical structure in a slide deck. Slide Gestalt computes the visual and textual correspondences between slides to generate hierarchical groupings. Readers can navigate the slide deck from the higher-level section overview to the lower-level description of a slide group or individual elements interactively with our UI. We derived side consumption and authoring practices from interviews with BVI readers and sighted creators and an analysis of 100 decks. We performed our pipeline with 50 real-world slide decks and a large dataset. Feedback from eight BVI participants showed that Slide Gestalt helped navigate a slide deck by anchoring content more efficiently, compared to using accessible slides.},
keywords = {accessibility, CHI, google, human-computer interaction},
pubstate = {published},
tppubtype = {inproceedings}
}
Peggy Chi, Tao Dong, Christian Frueh, Brian Colonna, Vivek Kwatra, Irfan Essa
Synthesis-Assisted Video Prototyping From a Document Proceedings Article
In: Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, pp. 1–10, 2022.
Abstract | Links | BibTeX | Tags: computational video, generative media, google, human-computer interaction, UIST, video editing
@inproceedings{2022-Chi-SVPFD,
title = {Synthesis-Assisted Video Prototyping From a Document},
author = {Peggy Chi and Tao Dong and Christian Frueh and Brian Colonna and Vivek Kwatra and Irfan Essa},
url = {https://research.google/pubs/pub51631/
https://dl.acm.org/doi/abs/10.1145/3526113.3545676},
doi = {10.1145/3526113.3545676},
year = {2022},
date = {2022-10-01},
urldate = {2022-10-01},
booktitle = {Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology},
pages = {1--10},
abstract = {Video productions commonly start with a script, especially for talking head videos that feature a speaker narrating to the camera. When the source materials come from a written document -- such as a web tutorial, it takes iterations to refine content from a text article to a spoken dialogue, while considering visual compositions in each scene. We propose Doc2Video, a video prototyping approach that converts a document to interactive scripting with a preview of synthetic talking head videos. Our pipeline decomposes a source document into a series of scenes, each automatically creating a synthesized video of a virtual instructor. Designed for a specific domain -- programming cookbooks, we apply visual elements from the source document, such as a keyword, a code snippet or a screenshot, in suitable layouts. Users edit narration sentences, break or combine sections, and modify visuals to prototype a video in our Editing UI. We evaluated our pipeline with public programming cookbooks. Feedback from professional creators shows that our method provided a reasonable starting point to engage them in interactive scripting for a narrated instructional video.},
keywords = {computational video, generative media, google, human-computer interaction, UIST, video editing},
pubstate = {published},
tppubtype = {inproceedings}
}
Peggy Chi, Nathan Frey, Katrina Panovich, Irfan Essa
Automatic Instructional Video Creation from a Markdown-Formatted Tutorial Proceedings Article
In: ACM Symposium on User Interface Software and Technology (UIST), ACM Press, 2021.
Abstract | Links | BibTeX | Tags: google, human-computer interaction, UIST, video editting
@inproceedings{2021-Chi-AIVCFMT,
title = {Automatic Instructional Video Creation from a Markdown-Formatted Tutorial},
author = {Peggy Chi and Nathan Frey and Katrina Panovich and Irfan Essa},
url = {https://doi.org/10.1145/3472749.3474778
https://research.google/pubs/pub50745/
https://youtu.be/WmrZ7PUjyuM},
doi = {10.1145/3472749.3474778},
year = {2021},
date = {2021-10-01},
urldate = {2021-10-01},
booktitle = {ACM Symposium on User Interface Software and Technology (UIST)},
publisher = {ACM Press},
abstract = {We introduce HowToCut, an automatic approach that converts a Markdown-formatted tutorial into an interactive video that presents the visual instructions with a synthesized voiceover for narration. HowToCut extracts instructional content from a multimedia document that describes a step-by-step procedure. Our method selects and converts text instructions to a voiceover. It makes automatic editing decisions to align the narration with edited visual assets, including step images, videos, and text overlays. We derive our video editing strategies from an analysis of 125 web tutorials and apply Computer Vision techniques to the assets. To enable viewers to interactively navigate the tutorial, HowToCut's conversational UI presents instructions in multiple formats upon user commands. We evaluated our automatically-generated video tutorials through user studies (N=20) and validated the video quality via an online survey (N=93). The evaluation shows that our method was able to effectively create informative and useful instructional videos from a web tutorial document for both reviewing and following.},
keywords = {google, human-computer interaction, UIST, video editting},
pubstate = {published},
tppubtype = {inproceedings}
}
Anh Truong, Peggy Chi, David Salesin, Irfan Essa, Maneesh Agrawala
Automatic Generation of Two-Level Hierarchical Tutorials from Instructional Makeup Videos Proceedings Article
In: ACM CHI Conference on Human factors in Computing Systems, 2021.
Abstract | Links | BibTeX | Tags: CHI, computational video, google, human-computer interaction, video summarization
@inproceedings{2021-Truong-AGTHTFIMV,
title = {Automatic Generation of Two-Level Hierarchical Tutorials from Instructional Makeup Videos},
author = {Anh Truong and Peggy Chi and David Salesin and Irfan Essa and Maneesh Agrawala},
url = {https://dl.acm.org/doi/10.1145/3411764.3445721
https://research.google/pubs/pub50007/
http://anhtruong.org/makeup_breakdown/},
doi = {10.1145/3411764.3445721},
year = {2021},
date = {2021-05-01},
urldate = {2021-05-01},
booktitle = {ACM CHI Conference on Human factors in Computing Systems},
abstract = {We present a multi-modal approach for automatically generating hierarchical tutorials from instructional makeup videos. Our approach is inspired by prior research in cognitive psychology, which suggests that people mentally segment procedural tasks into event hierarchies, where coarse-grained events focus on objects while fine-grained events focus on actions. In the instructional makeup domain, we find that objects correspond to facial parts while fine-grained steps correspond to actions on those facial parts. Given an input instructional makeup video, we apply a set of heuristics that combine computer vision techniques with transcript text analysis to automatically identify the fine-level action steps and group these steps by facial part to form the coarse-level events. We provide a voice-enabled, mixed-media UI to visualize the resulting hierarchy and allow users to efficiently navigate the tutorial (e.g., skip ahead, return to previous steps) at their own pace. Users can navigate the hierarchy at both the facial-part and action-step levels using click-based interactions and voice commands. We demonstrate the effectiveness of segmentation algorithms and the resulting mixed-media UI on a variety of input makeup videos. A user study shows that users prefer following instructional makeup videos in our mixed-media format to the standard video UI and that they find our format much easier to navigate.},
keywords = {CHI, computational video, google, human-computer interaction, video summarization},
pubstate = {published},
tppubtype = {inproceedings}
}
Peggy Chi, Zheng Sun, Katrina Panovich, Irfan Essa
Automatic Video Creation From a Web Page Proceedings Article
In: Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, pp. 279–292, ACM CHI 2020.
Abstract | Links | BibTeX | Tags: computational video, google, human-computer interaction, UIST, video editing
@inproceedings{2020-Chi-AVCFP,
title = {Automatic Video Creation From a Web Page},
author = {Peggy Chi and Zheng Sun and Katrina Panovich and Irfan Essa},
url = {https://dl.acm.org/doi/abs/10.1145/3379337.3415814
https://research.google/pubs/pub49618/
https://ai.googleblog.com/2020/10/experimenting-with-automatic-video.html
https://www.youtube.com/watch?v=3yFYc-Wet8k},
doi = {10.1145/3379337.3415814},
year = {2020},
date = {2020-10-01},
urldate = {2020-10-01},
booktitle = {Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology},
pages = {279--292},
organization = {ACM CHI},
abstract = {Creating marketing videos from scratch can be challenging, especially when designing for multiple platforms with different viewing criteria. We present URL2Video, an automatic approach that converts a web page into a short video given temporal and visual constraints. URL2Video captures quality materials and design styles extracted from a web page, including fonts, colors, and layouts. Using constraint programming, URL2Video's design engine organizes the visual assets into a sequence of shots and renders to a video with user-specified aspect ratio and duration. Creators can review the video composition, modify constraints, and generate video variation through a user interface. We learned the design process from designers and compared our automatically generated results with their creation through interviews and an online survey. The evaluation shows that URL2Video effectively extracted design elements from a web page and supported designers by bootstrapping the video creation process.},
keywords = {computational video, google, human-computer interaction, UIST, video editing},
pubstate = {published},
tppubtype = {inproceedings}
}
Peggy Chi, Irfan Essa
Interactive Visual Description of a Web Page for Smart Speakers Proceedings Article
In: Proceedings of ACM CHI Workshop, CUI@CHI: Mapping Grand Challenges for the Conversational User Interface Community, Honolulu, Hawaii, USA, 2020.
Abstract | Links | BibTeX | Tags: accessibility, CHI, google, human-computer interaction
@inproceedings{2020-Chi-IVDPSS,
title = {Interactive Visual Description of a Web Page for Smart Speakers},
author = {Peggy Chi and Irfan Essa},
url = {https://research.google/pubs/pub49441/
http://www.speechinteraction.org/CHI2020/programme.html},
year = {2020},
date = {2020-05-01},
urldate = {2020-05-01},
booktitle = {Proceedings of ACM CHI Workshop, CUI@CHI: Mapping Grand Challenges for the Conversational User Interface Community},
address = {Honolulu, Hawaii, USA},
abstract = {Smart speakers are becoming ubiquitous for accessing lightweight information using speech. While these devices are powerful for question answering and service operations using voice commands, it is challenging to navigate content of rich formats–including web pages–that are consumed by mainstream computing devices. We conducted a comparative study with 12 participants that suggests and motivates the use of a narrative voice output of a web page as being easier to follow and comprehend than a conventional screen reader. We are developing a tool that automatically narrates web documents based on their visual structures with interactive prompts. We discuss the design challenges for a conversational agent to intelligently select content for a more personalized experience, where we hope to contribute to the CUI workshop and form a discussion for future research.
},
keywords = {accessibility, CHI, google, human-computer interaction},
pubstate = {published},
tppubtype = {inproceedings}
}
W. Rogers, I. Essa, A. Fisk
Designing a Technology Coach Journal Article
In: Ergonomics in Design, Journal of the Human Factors and Ergonomics Society, vol. 15, no. 3, pp. 17–23, 2007.
Abstract | Links | BibTeX | Tags: aging-in-place, aware home, human-computer interaction
@article{2007-Rogers-DTC,
title = {Designing a Technology Coach},
author = {W. Rogers and I. Essa and A. Fisk},
url = {https://doi.org/10.1177/1064804607015003},
doi = {10.1177/1064804607015003},
year = {2007},
date = {2007-07-01},
urldate = {2007-07-01},
journal = {Ergonomics in Design, Journal of the Human Factors and Ergonomics Society},
volume = {15},
number = {3},
pages = {17--23},
abstract = {Technology in the home environment has the potential to support older adults in a variety of ways. We took an interdisciplinary approach (human factors/ergonomics and computer science) to develop a technology “coach” that could support older adults in learning to use a medical device. Our system provided a computer vision system to track the use of a blood glucose meter and provide users with feedback if they made an error. This research could support the development of an in-home personal assistant to coach individuals in a variety of tasks necessary for independent living.
},
keywords = {aging-in-place, aware home, human-computer interaction},
pubstate = {published},
tppubtype = {article}
}
Irfan Essa, Gregory Abowd, Aaron Bobick, Elizabeth Mynatt, Wendy Rogers
Building and Aware Home: Technologies for the way we may live Proceedings Article
In: Proceedings of First International Workshop on Man-Machine Symbiosis, Kyoto, Japan, 2002.
BibTeX | Tags: aging-in-place, computational health, human-computer interaction
@inproceedings{2002-Essa-BAHTL,
title = {Building and Aware Home: Technologies for the way we may live},
author = {Irfan Essa and Gregory Abowd and Aaron Bobick and Elizabeth Mynatt and Wendy Rogers},
year = {2002},
date = {2002-01-01},
urldate = {2002-01-01},
booktitle = {Proceedings of First International Workshop on Man-Machine Symbiosis},
address = {Kyoto, Japan},
keywords = {aging-in-place, computational health, human-computer interaction},
pubstate = {published},
tppubtype = {inproceedings}
}
Alex Pentland, Stan Sclaroff, Trevor Darrell, Irfan Essa, Ali Azarbayejani, Thad Starner
Visually guided interaction and animation Proceedings Article
In: Asilomar Conference on Signals, Systems, and Computers, pp. 1287 - 1291, Pacific Grove, CA, 1994.
BibTeX | Tags: computer animation, human-computer interaction, multimodal interfaces
@inproceedings{1994-Pentland-VGIA,
title = {Visually guided interaction and animation},
author = {Alex Pentland and Stan Sclaroff and Trevor Darrell and Irfan Essa and Ali Azarbayejani and Thad Starner},
year = {1994},
date = {1994-01-01},
urldate = {1994-01-01},
booktitle = {Asilomar Conference on Signals, Systems, and Computers},
volume = {2},
pages = {1287 - 1291},
address = {Pacific Grove, CA},
keywords = {computer animation, human-computer interaction, multimodal interfaces},
pubstate = {published},
tppubtype = {inproceedings}
}
Other Publication Sites
A few more sites that aggregate research publications: Academic.edu, Bibsonomy, CiteULike, Mendeley.
Copyright/About
[Please see the Copyright Statement that may apply to the content listed here.]
This list of publications is produced by using the teachPress plugin for WordPress.