vqgan_imagenet_f16_16384,Revisiting Pre-trained Models

753 阅读 0 评论 45 点赞

此页面所有软件内容、截图、价格、介绍等均来源于互联网，地址均为第三方提供，请谨慎下载。

VQGAN-f16-16384

Model Description

This is a Flax/JAX implementation of VQGAN, which learns a codebook of context-rich visual parts by leveraging both the use of convolutional methods and transformers. It was introduced in Taming Transformers for High-Resolution Image Synthesis (CVPR paper).

The model allows the encoding of images as a fixed-length sequence of tokens taken from the codebook.

This version of the model uses a reduction factor f=16 and a vocabulary of 16,384 tokens.

As an example of how the reduction factor works, images of size 256x256 are encoded to sequences of 256 tokens: 256/16 * 256/16. Images of 512x512 would result in sequences of 1024 tokens.

This model was ported to JAX using a checkpoint trained on ImageNet.

How to Use

The checkpoint can be loaded using Suraj Patil's implementation of VQModel.

Other

This model can be used as part of the implementation of DALL·E mini. Our report contains more details on how to leverage it in an image encoding / generation pipeline.

网友提问

温馨提示! 即将跳转到 第三方 网站下载具体内容

下载地址 ① 下载地址 ②

点赞(45) 打赏

本文分类：软件源码
本文标签：vqgan_imagenet_f16_16384是什么 vqgan_imagenet_f16_16384源代码 vqgan_imagenet_f16_16384下载 vqgan_imagenet_f16_16384开发
浏览次数：753 次浏览
发布日期：2023-07-16 21:28:03
本文链接：https://yunkanjia.com/ruanjianyuanma/t1689514083198.html

上一篇 > gpt4all-13b-snoozy,Revisiting Pre-trained Models
下一篇 > opus-mt-en-el,Revisiting Pre-trained Models