1. Introduction

  2. Related Work

    2.1 Semantic Typographic Logo Design

    2.2 Generative Model for Computational Design

    2.3 Graphic Design Authoring Tool

  3. Formative Study

    3.1 General Workflow and Challenges

    3.2 Concerns in Generative Model Involvement

    3.3 Design Space of Semantic Typography Work

  4. Design Consideration

  5. Typedance and 5.1 Ideation

    5.2 Selection

    5.3 Generation

    5.4 Evaluation

    5.5 Iteration

  6. Interface Walkthrough and 6.1 Pre-generation stage

    6.2 Generation stage

    6.3 Post-generation stage

  7. Evaluation and 7.1 Baseline Comparison

    7.2 User Study

    7.3 Results Analysis

    7.4 Limitation

  8. Discussion

    8.1 Personalized Design: Intent-aware Collaboration with AI

    8.2 Incorporating Design Knowledge into Creativity Support Tools

    8.3 Mix-User Oriented Design Workflow

  9. Conclusion and References

2.1 Semantic Typographic Logo Design

Semantic typographic logos are harmonious integration of typeface and imagery, where the imagery is visually illustrated by typeface [23, 43, 48]. Compared with plain wordmark [50, 52] and pictorial logo [22, 30], semantic typographic logo allows a more cohesive approach to encode both word and graphic content and enhance the association between them. The capacity to embody rich symbolism and expressiveness has led to increasing adoption of semantic typographic logos across various scenarios, such as cultural promotion [3], commercial brand [15] and personal identity [28]. Extensive research has explored how typefaces can be designed to reinforce semantic meaning at varying levels of granularity. Some studies subdivide typeface into a series skeletal strokes with user-guided [38] and automatic segmentation [4], and then apply structural stylization to each stroke and junction separately. In contrast, recent studies [23, 48, 59] have shifted their focus from stroke-level stylization to individual letter stylization using predefined templates. For instance, Tendulkar et al. [48] replaced letters with clipart icons relevant to the imagery and visually resembling the corresponding letter. Another approach, as demonstrated by Xu et al. [55], involves compressing the multi-letter and arranging them into a predetermined semantic shape. This approach has been further enhanced by Zou et al. [66], who proposed an automatic framework that supports the placement, packing, and deformation of compact calligrams.

While prior research extensively investigated the semantic typographic logo across different typeface design granularities, two key issues persist: 1) these models are constructed for typefaces with specific granularity, limiting their applicability, and 2) little is known regarding the mapping relationship between typeface and imagery. These works typically employ a simple approach where one typeface is paired with one specific imagery. To explore the design space, we collect a real-world corpus, analyze typeface granularity and type-imagery mapping, and instantiate these design principles in TypeDance. Then we propose a unified framework based on diffusion model to support flexible blending between imagery and typefaces at different granularities.

2.2 Generative Model for Computational Design

Computational design has garnered considerable attention in the field of generative techniques. Recently, there have been advancements in aligning semantic meaning between image and text pairs, making natural language a valuable tool that bridges the gap between humans and creativity [29, 39]. Numerous studies have exploited such semantic consistency to retrieve relevant images from the corpus using natural language statements, which can be used as design materials to generate new designs [12, 64]. While previous studies relied on retrieving from limited corpus and predefined templates, more recent research has proposed text-to-image diffusion models [40, 47] that surpass mainstream GAN models[18] and autoregressive models[41]. However, this plain text-guided generation relies heavily on well-designed prompts, leading to unstable results devoid of user control. To address this issue and enhance user customization, recent advancements have introduced image-based conditions for achieving controllable manipulations, including depthmap [42] and edgemap [63]. Some generative models focusing on font stylization only support the letter-level generation [23] and require collecting images containing the specific imagery for fine-tuning the model [47].

While prior works have demonstrated incredible generative ability in creating complex structures and meaningful semantics, ensuring the readability of both the typeface and the imagery remains a daunting task. In particular, the text condition lacks sufficient restrictions to capture all user intentions, while the image condition is overly rigid and cannot accommodate the inclusion of additional information. To tackle this challenge, Mou et al. [34] proposed an approach that combines multiple conditions to improve controllability. Similarly, Vistylist [45] disentangles the design space, enabling the generation with combined user-intended design factors. TypeDance builds upon these previous research efforts by providing several design priors that allude to the characteristics of semantic typographic logos. These design priors extracted from user-provided images serve as guidance for users to select and incorporate into their designs. With support for both text and image conditions, TypeDance empowers users with flexible control, enabling personalized and distinctive design outcomes.

2.3 Graphic Design Authoring Tool

Significant works have developed authoring tools to facilitate graphic design, which can be broadly divided into two primary categories: ideation and creation tools. In the domain of ideation, several research studies [24, 26, 56] have proposed interfaces aimed at inspiring ideas and facilitating the exploration of design materials. For example, MetaMap [24] employed a mindmap-like structure encompassing three design dimensions to stimulate users and encourage them to generate a wide range of unique and varied ideas. Regarding the creation process, as Xiao et al. [54] identified, mainstream works follow a two-stage pipeline, which involves retrieving examples and adapting them as design material [62] and style transfer reference [45]. More recently, researchers sought to blend approaches to create a novel design based on existing design materials. During the process, spatially compositing semantically related icons to generate a compound design in a resourceful manner is adopted by some researchers [13, 64]. Similarly, Zhang et al. [61] demonstrated that compositing coherent imagery elements can create an ornamental typeface with wide conceptual coverage. On the other hand, Chilton et al. [8, 9] further explored the potential of blending through similar shape substitution. For instance, they showed that the “Starbucks logo” can replace the position of the “sun” as both have a circular shape.

However, spatial composition and shape substitution techniques encounter challenges when dealing with the complexity of semantic typographic logos, in which typeface and imagery need to be spatially fused as a whole despite the absence of shape similarity. In this work, Typedance utilizes diffusion models to incorporate imagery detail while preserving the salient representation of the typeface, enabling a more natural blend. Additionally, Typedance integrates both ideation and creation functions. To ensure the readability of both the typeface and the imagery in semantic typographic logos, an evaluation component is further incorporated, enhancing the faithfulness of the design process.preserving the salient representation of the typeface, enabling a more natural blend. Additionally, Typedance integrates both ideation and creation functions. To ensure the readability of both the typeface and the imagery in semantic typographic logos, an evaluation component is further incorporated, enhancing the faithfulness of the design process.

Authors:

(1) SHISHI XIAO, The Hong Kong University of Science and Technology (Guangzhou), China;

(2) LIANGWEI WANG, The Hong Kong University of Science and Technology (Guangzhou), China;

(3) XIAOJUAN MA, The Hong Kong University of Science and Technology, China;

(4) WEI ZENG, The Hong Kong University of Science and Technology (Guangzhou), China.


This paper is available on arxiv under ATTRIBUTION-NONCOMMERCIAL-SHAREALIKE 4.0 INTERNATIONAL license.