Ë ¹rœhGd=„d>ejZ«Z?Gd?„d@ejZ«Z@edA¬«GdB„dCe9««ZAGdD„dEejZ«ZBGdF„dGejZ«ZCGdH„dIejZ«ZDGdJ„dKejZ«ZEedL¬«GdM„dNe9««ZFgdO¢ZGy)PzPyTorch FLAVA model.éN)ÚOrderedDict)Ú dataclass)ÚAnyÚOptionalÚUnion)Únné)ÚACT2FN)ÚGradientCheckpointingLayer)ÚBaseModelOutputÚBaseModelOutputWithPooling)ÚPreTrainedModelÚ find_pruneable_heads_and_indicesÚprune_linear_layer)ÚModelOutputÚauto_docstringÚloggingÚ torch_inté)ÚFlavaConfigÚFlavaImageCodebookConfigÚFlavaImageConfigÚFlavaMultimodalConfigÚFlavaTextConfigzfacebook/flava-image-codebookg$(~Œ¹k@a– Output from FlavaModel containing embeddings and outputs from individual encoders. Note that `image_embeddings` and `text_embeddigns` returned are similar to pooled output returned from a transformer. If you want embeddings for contrastive loss or retrieval use a FLAVA model's `image_projection` and `text_projection` layers on `image_embeddings` and `text_embeddings` respectively. )Úcustom_introcóØ—eZdZUdZdZeejed<dZ ee ed<dZeejed<dZee ed<dZ eejed<dZee ed<d eefd „Zy)ÚFlavaModelOutputaì image_embeddings (`torch.FloatTensor` of shape `(batch_size, output_dim)`, *optional*, returned when `pixel_values` are present): The image embeddings which are basically the pooled output of [`FlavaImageModel`]. image_output (`BaseModelOutputWithPooling`, *optional*, returned when `pixel_values` are present): The output of the [`FlavaImageModel`]. text_embeddings (`torch.FloatTensor` of shape `(batch_size, output_dim)`, *optional*, returned when `input_ids` are present): The text embeddings which are basically the pooled output of [`FlavaTextModel`]. text_output (`BaseModelOutputWithPooling`, *optional*, returned when `input_ids` are present): The output of the [`FlavaTextModel`]. multimodal_embeddings (`torch.FloatTensor` of shape `(batch_size, output_dim)`, *optional*, returned when `input_ids` and `pixel_values` are present and `skip_multimodal_encoder` is `None` or `False`): The multimodal embeddings which are basically the pooled output of [`FlavaTextModel`]. multimodal_output (`BaseModelOutputWithPooling`, returned when `input_ids` and `pixel_values` are present and `skip_multimodal_encoder` is `None` or `False`): The output of the [`FlavaMultimodalModel`]. NÚimage_embeddingsÚimage_outputÚtext_embeddingsÚtext_outputÚmultimodal_embeddingsÚmultimodal_outputÚreturncóH‡—tˆfd„‰j«D««S)Nc3ód•K—|]'}|dvr‰|nt‰|«j«–—Œ)yw))r!rr#N©ÚgetattrÚto_tuple)Ú.0ÚkÚselfs €ú{/var/www/html/ai-insurance-compliance-backend/venv/lib/python3.12/site-packages/transformers/models/flava/modeling_flava.pyú z,FlavaModelOutput.to_tuple..Us=øèø€ò àðÐ TÑTˆDŠGÔZaÐbfÐhiÓZj×ZsÑZsÓZuÓuñ ùóƒ-0©ÚtupleÚkeys©r,s`r-r)zFlavaModelOutput.to_tupleTs#ø€Üó à—Y‘Y“[ô ó ð ó)Ú__name__Ú __module__Ú__qualname__Ú__doc__rrÚtorchÚFloatTensorÚ__annotations__rr r r!r"r#r1rr)©r4r-rr3s‹…ñ ð59Ðh˜u×0Ñ0Ñ1Ó8Ø9=€L(Ð5Ñ6Ó=Ø37€OX˜e×/Ñ/Ñ0Ó7Ø8<€KÐ4Ñ5Ó<Ø9=Ð˜8 E×$5Ñ$5Ñ6Ó=Ø>BÐxÐ :Ñ;ÓBð ˜% ™*ô r4rz@ Class representing pretraining losses from FLAVA model có—eZdZUdZdZeejed<dZ eejed<dZ eejed<dZeejed<dZeejed<dZ eejed<d efd „Zy)ÚFlavaLossesa¯ mim (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `mim_labels` and `pixel_values` are present, `input_ids_masked` is absent and `mim_weight` > 0.): Masked Image Modeling loss as used in BeIT calculated only for unimodal image data. mlm (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `mlm_labels` and `input_ids_masked` are present, `pixel_values` is absent and `mlm_weight` > 0.): Masked Language Modeling loss as used in BERT calculated only for unimodal text data. itm (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `itm_labels`, `input_ids_masked`, `pixel_values` are present and `itm_weight` > 0.): Image Text Matching (ITM) loss calculated for paired image-text data. Note that ITM loss is calculated on masked pairs in FLAVA. global_contrastive (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `input_ids` and `pixel_values` are present and `global_contrastive_weight` > 0.): Contrastive loss for image-text similarity similar to CLIP but calculated globally for paired image-text data. This is calculated on unmasked images and texts. mmm_image (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `mim_labels`, `pixel_values` and `input_ids_masked` are present and `mmm_image_weight` > 0.): Masked Multimodal Modeling loss's image component calculated on paired image-text data. mmm_text (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `mlm_labels`, `pixel_values` and `input_ids_masked` are present and `mmm_text_weight` > 0.): Masked Multimodal Modeling loss's text component calculated on paired image-text data. NÚmimÚmlmÚitmÚglobal_contrastiveÚ mmm_imageÚmmm_textr$cóB—d}|j«D] }|€Œd}|S|S)NTF)Úvalues)r,Úall_noneÚvs r-rGzFlavaLosses.all_nonezs5€ØˆØ—‘“ò ˆAØ‰}Ø ØØˆð ðˆr4)r5r6r7r8r?rr9r:r;r@rArBrCrDÚboolrGr<r4r-r>r>[s”…ñð"(,€Cˆ%×#Ñ#Ñ $Ó+Ø'+€Cˆ%×#Ñ#Ñ $Ó+Ø'+€Cˆ%×#Ñ#Ñ $Ó+Ø6:Ð˜ ×!2Ñ!2Ñ3Ó:Ø-1€Iˆx˜×)Ñ)Ñ*Ó1Ø,0€Hˆhu×(Ñ(Ñ)Ó0ð˜$ôr4r>a Output from FlavaForPreTraining containing embeddings, and outputs from individual encoders. Note that `image_embeddings` and `text_embeddings` returned are similar to pooled output returned from a transformer. If you want embeddings for contrastive loss or retrieval use a FLAVA model's `image_projection` and `text_projection` layers on `image_embeddings` and `text_embeddings` respectively. cóÚ—eZdZUdZdZeejed<dZ e ed<dZeejed<dZee ed<dZeejed<dZee ed<dZeejed <dZee ed <dZeejed<dZee ed<dZeejed <dZee ed<dZeejed<dZee ed<dZeejed<dZeejed<dZeejed<dZeejed<dZeejed<dZeejed<dZeejed<dee fd„Z!y)ÚFlavaForPreTrainingOutputay loss (`torch.FloatTensor`, *optional*, returned when `return_loss` is True): Total loss calculated for this model. loss_info (`FlavaLosses`): Detailed info for FLAVA Pretraining losses. Check `FlavaLosses` class description for the information on the keys. image_embeddings (`torch.FloatTensor` of shape `(batch_size, output_dim)`, *optional*, returned when `pixel_values` are present): The image embeddings which are basically the pooled output of [`FlavaImageModel`]. image_output (`BaseModelOutputWithPooling`, *optional*, returned when `pixel_values` are present): The output of the [`FlavaImageModel`]. text_embeddings (`torch.FloatTensor` of shape `(batch_size, output_dim)`, *optional*, returned when `input_ids` are present): The text embeddings which are basically the pooled output of [`FlavaTextModel`]. text_output (`BaseModelOutputWithPooling`, *optional*, returned when `input_ids` are present): The output of the [`FlavaTextModel`]. multimodal_embeddings (`torch.FloatTensor` of shape `(batch_size, output_dim)`, *optional*, returned when `input_ids` and `pixel_values` are present and `skip_unmasked_multimodal_encoder` is `None` or `False`): The multimodal embeddings which are basically the pooled output of [`FlavaTextModel`]. multimodal_output (`BaseModelOutputWithPooling`, returned when `input_ids` and `pixel_values` are present and `skip_unmasked_multimodal_encoder` is `None` or `False`): The output of the [`FlavaMultimodalModel`]. image_masked_embeddings (`torch.FloatTensor` of shape `(batch_size, output_dim)`, *optional*, returned when `pixel_values` are present): The image embeddings which are basically the pooled output of [`FlavaImageModel`]. Uses `bool_masked_pos` to create masked images. image_masked_output (`BaseModelOutputWithPooling`, *optional*, returned when `pixel_values` are present): The output of the [`FlavaImageModel`]. Uses `bool_masked_pos` to create masked images. text_masked_embeddings (`torch.FloatTensor` of shape `(batch_size, output_dim)`, *optional*, returned when `input_ids_masked` are present): The text embeddings which are basically the pooled output of [`FlavaTextModel`]. text_masked_output (`BaseModelOutputWithPooling`, *optional*, returned when `input_ids_masked` are present): The output of the [`FlavaTextModel`]. multimodal_masked_embeddings (`torch.FloatTensor` of shape `(batch_size, output_dim)`, *optional*, returned when `input_ids` and `pixel_values` are present): The multimodal embeddings which are basically the pooled output of [`FlavaTextModel`]. multimodal_masked_output (`BaseModelOutputWithPooling`, *optional*, returned when `input_ids_masked` and `pixel_values` are present): The output of the [`FlavaMultimodalModel`]. mim_logits (`torch.FloatTensor` of shape `(batch_size, num_image_patches, image_vocab_size)` or of shape `(total_masked_patches, image_vocab_size)` , *optional*, returned when `pixel_values` are present and `input_ids_masked` are not): The logits for MIM unimodal loss. Uses `book_masked_pos` to get masked patches. The flattened output is returned when `bool_masked_pos` has some of the patches masked. mlm_logits (`torch.FloatTensor` of shape `(batch_size, text_seq_length, text_vocab_size)` or of shape `(total_masked_seq_length, text_vocab_size)`, *optional*, returned when `input_ids_masked` are present and `pixel_values` are not): The logits for MLM unimodal loss. The flattened output is returned when `input_ids_masked` has some of the tokens masked. itm_logits (`torch.FloatTensor` of shape `(batch_size, 2)`, *optional*, returned when `input_ids_masked` and `pixel_values` are present): The logits for ITM loss. Note that ITM loss is calculated on masked pairs in FLAVA. contrastive_logits_per_image (`torch.FloatTensor` of shape `(image_batch_size, text_batch_size)`): The scaled dot product scores between `image_embeddings` and `text_embeddings` but passed through FLAVA's `image_projection` and `text_projection` layers respectively. This represents the image-text similarity scores. This is calculated on unmasked images and texts. contrastive_logits_per_text (`torch.FloatTensor` of shape `(text_batch_size, image_batch_size)`): The scaled dot product scores between `text_embeddings` and `image_embeddings` but passed through FLAVA's `text_projection` and `image_projection` layers respectively. This is calculated on unmasked images and texts. mmm_image_logits (`torch.FloatTensor` of shape `(batch_size, num_image_patches, image_vocab_size)` or of shape`(total_masked_patches, image_vocab_size)`, *optional*, returned when `pixel_values` and `input_ids_masked` are present): The logits for MMM image multimodal loss. Uses `book_masked_pos` to get masked patches. The flattened output is returned when `bool_masked_pos` has some of the patches masked. mmm_text_logits (`torch.FloatTensor` of shape `(batch_size, text_seq_length, text_vocab_size)` or of shape `(`(total_masked_seq_length, text_vocab_size)`), *optional*, returned when `pixel_values` and `input_ids_masked` are present): The logits for MMM text multimodal loss. The flattened output is returned when `input_ids_masked` has some of the tokens masked. NÚlossÚ loss_inforrr r!r"r#Úimage_masked_embeddingsÚimage_masked_outputÚtext_masked_embeddingsÚtext_masked_outputÚmultimodal_masked_embeddingsÚmultimodal_masked_outputÚ mim_logitsÚ mlm_logitsÚ itm_logitsÚcontrastive_logits_per_imageÚcontrastive_logits_per_textÚmmm_image_logitsÚmmm_text_logitsr$cóT‡‡—gd¢Štˆˆfd„‰j«D««S)N)r!rr#rQrOrSc3ód•K—|]'}|‰vr‰|nt‰|«j«–—Œ)yw©Nr')r*r+r,Útransformer_outputss €€r-r.z5FlavaForPreTrainingOutput.to_tuple..äs4øèø€ÒsÐbc Ð)<Ñ <T˜!’WÄ'È$ÐPQÓBR×B[ÑB[ÓB]Ó]Ñsùr/r0)r,r^s`@r-r)z"FlavaForPreTrainingOutput.to_tupleÛs(ù€ò ÐôÔsÐgk×gpÑgpÓgrÔsÓsÐsr4)"r5r6r7r8rLrr9r:r;rMr>rrr r r!r"r#rNrOrPrQrRrSrTrUrVrWrXrYrZr1rr)r<r4r-rKrKƒs¸…ñ5ðn)-€Dˆ(5×$Ñ$Ñ %Ó,Ø!€Iˆ{Ó!Ø48Ðh˜u×0Ñ0Ñ1Ó8Ø9=€L(Ð5Ñ6Ó=Ø37€OX˜e×/Ñ/Ñ0Ó7Ø8<€KÐ4Ñ5Ó<Ø9=Ð˜8 E×$5Ñ$5Ñ6Ó=Ø>BÐxÐ :Ñ;ÓBØ;?Ð˜X e×&7Ñ&7Ñ8Ó?Ø@DÐ˜Ð"<Ñ=ÓDØ:>Ð˜H U×%6Ñ%6Ñ7Ó>Ø?CÐ˜Ð!;Ñ<ÓCØ@DÐ (¨5×+<Ñ+<Ñ"=ÓDØEIÐ˜hÐ'AÑBÓIØ.2€J˜×*Ñ*Ñ+Ó2Ø.2€J˜×*Ñ*Ñ+Ó2Ø.2€J˜×*Ñ*Ñ+Ó2Ø@DÐ (¨5×+<Ñ+<Ñ"=ÓDØ?CÐ ¨%×*;Ñ*;Ñ!<ÓCØ48Ðh˜u×0Ñ0Ñ1Ó8Ø37€OX˜e×/Ñ/Ñ0Ó7ð t˜% ™*ô tr4rKc óÒ‡—eZdZdZddededdfˆfd„ Zdejde d e dejfd „Z ddejdeejd edejfd„Z ˆxZS)ÚFlavaImageEmbeddingszb Construct the CLS token, position and patch embeddings. Optionally, also the mask token. ÚconfigÚuse_mask_tokenr$NcóÂ•—t‰|«|xs|j}tjtjdd|j««|_|r4tjtjdd|j««nd|_t|j|j|j|j¬«|_ |jj}tjtjd|dz|j««|_tj |j"«|_|j|_||_y)Nr)Ú image_sizeÚ patch_sizeÚnum_channelsÚ embed_dim)ÚsuperÚ__init__Ú mask_tokenrÚ Parameterr9ÚzerosÚhidden_sizeÚ cls_tokenÚPatchEmbeddingsrdrerfÚpatch_embeddingsÚnum_patchesÚposition_embeddingsÚDropoutÚhidden_dropout_probÚdropoutra)r,rarbrqÚ __class__s €r-rizFlavaImageEmbeddings.__init__îsø€Ü ‰ÑÔà'Ò<¨6×+<Ñ+<ˆÜŸ™¤e§k¡k°!°Q¸×8JÑ8JÓ&KÓLˆŒÙQ_œ"Ÿ,™,¤u§{¡{°1°a¸×9KÑ9KÓ'LÔMÐeiˆŒÜ /Ø×(Ñ(Ø×(Ñ(Ø×,Ñ,Ø×(Ñ(ô ! ˆÔð×+Ñ+×7Ñ7ˆÜ#%§<¡<´·±¸A¸{ÈQ¹ÐPV×PbÑPbÓ0cÓ#dˆÔ Ü—z‘z &×"<Ñ"<Ó=ˆŒØ ×+Ñ+ˆŒØˆr4Ú embeddingsÚheightÚwidthcó¦—|jddz }|jjddz }tjj «s||k(r||k(r|jS|jdd…dd…f}|jdd…dd…f}|jd}||j z} ||j z} t |dz«}|jd|||«}|jdddd«}tjj|| | fdd ¬ «}|jdddd«jdd|«}tj||fd¬«S)a This method allows to interpolate the pre-trained position encodings, to be able to use the model on higher resolution images. This method is also adapted to support torch.jit tracing. Adapted from: - https://github.com/facebookresearch/dino/blob/de9ee3df6cf39fac952ab558447af1fa1365362a/vision_transformer.py#L174-L194, and - https://github.com/facebookresearch/dinov2/blob/e1277af2ba9496fbadf7aec6eba56e8d882d1e35/dinov2/models/vision_transformer.py#L179-L211 rNéÿÿÿÿgà?rr éÚbicubicF)ÚsizeÚmodeÚ align_corners©Údim)Úshaperrr9ÚjitÚ is_tracingrerÚreshapeÚpermuterÚ functionalÚinterpolateÚviewÚcat)r,rwrxryrqÚ num_positionsÚclass_pos_embedÚpatch_pos_embedr‚Ú new_heightÚ new_widthÚsqrt_num_positionss r-Úinterpolate_pos_encodingz-FlavaImageEmbeddings.interpolate_pos_encodings`€ð!×&Ñ& qÑ)¨AÑ-ˆØ×0Ñ0×6Ñ6°qÑ9¸AÑ=ˆ ôy‰y×#Ñ#Ô%¨+¸Ò*FÈ6ÐUZÊ?Ø×+Ñ+Ð+à×2Ñ2²1°b°q°b°5Ñ9ˆØ×2Ñ2²1°a±b°5Ñ9ˆà×Ñ˜rÑ"ˆà˜tŸ™Ñ.ˆ Ø˜TŸ_™_Ñ,ˆ ä& }°cÑ'9Ó:ÐØ)×1Ñ1°!Ð5GÐI[Ð]`ÓaˆØ)×1Ñ1°!°Q¸¸1Ó=ˆäŸ-™-×3Ñ3ØØ˜iÐ(ØØð 4ó ˆð*×1Ñ1°!°Q¸¸1Ó=×BÑBÀ1ÀbÈ#ÓNˆäy‰y˜/¨?Ð;ÀÔCÐCr4Úpixel_valuesÚbool_masked_posr’cóV—|j\}}}}|j||¬«}|j«\}} } ||jj || d«}|j«dk(r!|j |jd«d«}|jd«j|«}|d|z z||zz}|jj |dd«} tj| |fd¬«}|r||j|||«z}n||jz}|j|«}|S)N)r’r{r rçð?rr)rƒrpr~rjÚexpandr‚rŠÚ unsqueezeÚtype_asrnr9r‹r’rrru)r,r“r”r’Ú batch_sizerfrxryrwÚseq_lenÚ_Úmask_tokensÚmaskÚ cls_tokenss r-ÚforwardzFlavaImageEmbeddings.forward)s4€ð3?×2DÑ2DÑ/ˆ L &¨%Ø×*Ñ*¨<ÐRjÐ*Ókˆ à!+§¡Ó!2Ñˆ G˜QØÐ&ØŸ/™/×0Ñ0°¸WÀbÓIˆKà×"Ñ"Ó$¨Ò)Ø"1×"6Ñ"6°×7KÑ7KÈAÓ7NÐPRÓ"Sà"×,Ñ,¨RÓ0×8Ñ8¸ÓEˆDØ# s¨T¡zÑ2°[À4Ñ5GÑGˆJð—^‘^×*Ñ*¨:°r¸2Ó>ˆ Ü—Y‘Y ¨JÐ7¸QÔ?ˆ ñ$Ø# d×&CÑ&CÀJÐPVÐX]Ó&^Ñ^‰Jà# d×&>Ñ&>Ñ>ˆJà—\‘\ *Ó-ˆ àÐr4©F©NF)r5r6r7r8rrIrir9ÚTensorÚintr’rÚ BoolTensorr Ú __classcell__©rvs@r-r`r`ésø„ññÐ/ðÀðÐRVõð&&D°5·<±<ð&DÈð&DÐUXð&DÐ]b×]iÑ]ió&DðV7;Ø).ñ à—l‘lðð" %×"2Ñ"2Ñ3ðð#'ð ð ‰÷r4r`c ó‡—eZdZdZ ddedeeeeeffdedefˆfd„ Zddejde d ejfd „ZˆxZS) roz# Image to Patch Embedding. rdrerfrgcóV•—t‰|«t|tjj «s||f}t|tjj «s||f}|d|dz|d|dzz}||_||_||_tj||||¬«|_y)Nrr)Úkernel_sizeÚstride)rhriÚ isinstanceÚcollectionsÚabcÚIterablerdrerqrÚConv2dÚ projection)r,rdrerfrgrqrvs €r-rizPatchEmbeddings.__init__Rsžø€ô ‰ÑÔÜ˜*¤k§o¡o×&>Ñ&>Ô?Ø$ jÐ1ˆJÜ˜*¤k§o¡o×&>Ñ&>Ô?Ø$ jÐ1ˆJØ! !‘}¨ °1© Ñ5¸*ÀQ¹-È:ÐVWÉ=Ñ:XÑYˆØ$ˆŒØ$ˆŒØ&ˆÔäŸ)™) L°)ÈÐ\fÔgˆr4r“r’r$có8—|j\}}}}|sV||jdk7s||jdk7r2td|›d|›d|jd›d|jd›d «‚|j|«j d«jdd«}|S)NrrzInput image size (Ú*z) doesn't match model (z).r|)rƒrdÚ ValueErrorr±ÚflattenÚ transpose)r,r“r’ršrfrxryÚxs r-r zPatchEmbeddings.forwardes€Ø2>×2DÑ2DÑ/ˆ L &¨%Ù'Ø˜Ÿ™¨Ñ+Ò+¨u¸¿¹ÈÑ8JÒ/JÜ Ø(¨¨°°%°ð9ØŸ™¨Ñ+Ð,¨A¨d¯o©o¸aÑ.@Ð-AÀðEóðð O‰O˜LÓ)×1Ñ1°!Ó4×>Ñ>¸qÀ!ÓDˆØˆr4)éàér ir¡) r5r6r7r8r¤rr1rir9r£rIr r¦r§s@r-roroMs}ø„ñðØ24ØØñhàðhð˜#˜u S¨# X™Ð.Ñ/ðhðð hð õhñ& E§L¡Lð ÈDð Ð]b×]iÑ]i÷ r4rocóŒ‡—eZdZdZˆfd„Z ddeejdeejdeejfd„ZˆxZ S)ÚFlavaTextEmbeddingszGConstruct the embeddings from word, position and token_type embeddings.có>•—t‰|«tj|j|j |j¬«|_tj|j|j «|_ tj|j|j «|_tj|j |j¬«|_tj|j«|_t#|dd«|_|j'dt)j*|j«j-d«d¬«|j'd t)j.|j0j3«t(j4¬ «d¬«y)N)Úpadding_idx©ÚepsÚposition_embedding_typeÚabsoluteÚposition_ids)rr{F)Ú persistentÚtoken_type_ids)Údtype)rhrirÚ EmbeddingÚ vocab_sizermÚpad_token_idÚword_embeddingsÚmax_position_embeddingsrrÚtype_vocab_sizeÚtoken_type_embeddingsÚ LayerNormÚlayer_norm_epsrsrtrur(rÀÚregister_bufferr9Úaranger—rlrÂr~Úlong©r,rarvs €r-rizFlavaTextEmbeddings.__init__ts/ø€Ü ‰ÑÔÜ!Ÿ|™|¨F×,=Ñ,=¸v×?QÑ?QÐ_e×_rÑ_rÔsˆÔÜ#%§<¡<°×0NÑ0NÐPV×PbÑPbÓ#cˆÔ Ü%'§\¡\°&×2HÑ2HÈ&×J\ÑJ\Ó%]ˆÔ"ôŸ™ f×&8Ñ&8¸f×>SÑ>SÔTˆŒÜ—z‘z &×"<Ñ"<Ó=ˆŒä'.¨vÐ7PÐR\Ó']ˆÔ$Ø×ÑØœEŸL™L¨×)GÑ)GÓH×OÑOÐPWÓXÐejð ô ð ×ÑØœeŸk™k¨$×*;Ñ*;×*@Ñ*@Ó*BÌ%Ï*É*ÔUÐbgð õ r4Ú input_idsrÄrÂcó$—|j«}|d}|€|jdd…d|…f}|€st|d«r-|jdd…d|…f}|j |d|«}|}n:tj|t j|jj¬«}|j|«}|j|«} || z} |jdk(r|j|«}| |z } |j| «} |j| «} | S)NrrÄr)rÅÚdevicerÁ)r~rÂÚhasattrrÄr—r9rlrÑrÕrÉrÌrÀrrrÍru)r,rÓrÄrÂÚinput_shapeÚ seq_lengthÚbuffered_token_type_idsÚ buffered_token_type_ids_expandedÚ inputs_embedsrÌrwrrs r-r zFlavaTextEmbeddings.forward‡s€ð —n‘nÓ&ˆØ ‘^ˆ àÐØ×,Ñ,ªQ°°°¨^Ñ<ˆLð Ð!ÜtÐ-Ô.Ø*.×*=Ñ*=ºaÀÀ*À¸nÑ*MÐ'Ø3J×3QÑ3QÐR]Ð^_ÑR`ÐblÓ3mÐ0Ø!A‘ä!&§¡¨[ÄÇ Á ÐSW×SdÑSd×SkÑSkÔ!là×,Ñ,¨YÓ7ˆ Ø $× :Ñ :¸>Ó JÐà"Ð%:Ñ:ˆ Ø×'Ñ'¨:Ò5Ø"&×":Ñ":¸<Ó"HÐØÐ-Ñ-ˆJØ—^‘^ JÓ/ˆ Ø—\‘\ *Ó-ˆ ØÐr4)NNN) r5r6r7r8rirr9r£r r¦r§s@r-r»r»qsRø„ÙQô ð*-1Ø15Ø/3ñ à˜EŸL™LÑ)ð ð! §¡Ñ.ð ð˜uŸ|™|Ñ,÷ r4r»cóê‡—eZdZdeddfˆfd„Z d dejdeejdeejdede e ejejfe ejff d „ZˆxZS)ÚFlavaSelfAttentionrar$Ncó•—t‰|«|j|jzdk7r2t |d«s&td|j›d|j›d«‚|j|_t |j|jz«|_|j|jz|_tj|j|j|j¬«|_tj|j|j|j¬«|_ tj|j|j|j¬«|_tj|j «|_y)NrÚembedding_sizezThe hidden size z4 is not a multiple of the number of attention heads ú.©Úbias)rhrirmÚnum_attention_headsrÖr´r¤Úattention_head_sizeÚ all_head_sizerÚLinearÚqkv_biasÚqueryÚkeyÚvaluersÚattention_probs_dropout_probrurÒs €r-rizFlavaSelfAttention.__init__«s.ø€Ü ‰ÑÔØ×Ñ × :Ñ :Ñ:¸aÒ?ÌÐPVÐXhÔHiÜØ" 6×#5Ñ#5Ð"6ð7Ø×3Ñ3Ð4°Að7óð ð $*×#=Ñ#=ˆÔ Ü#& v×'9Ñ'9¸F×SÑ"SÐØ*˜ ×*Ñ*Ð,CÐDˆ á6G= /Ð2ˆàˆðO\ÐM]ˆàˆr4©NNF) r5r6r7ÚFlavaPossibleConfigsrir9r£rrIrr1r r¦r§s@r-rÝrÝªs•ø„ðGÐ3ðG¸õGð*26Ø,0Ø"'ñ3à—|‘|ð3ð! §¡Ñ.ð3ð˜EŸL™LÑ)ð 3ð ð3ð ˆuU—\‘\ 5§<¡<Ð/Ñ0°%¸¿¹Ñ2EÐEÑ F÷ 3r4rÝcó|‡—eZdZdZdeddfˆfd„Zdejdejdejfd„ZˆxZ S) ÚFlavaSelfOutputzµ The residual connection is defined in FlavaLayer (same as ViTLayer) instead of here (as is the case with other models), due to the layernorm applied before each block. rar$NcóÈ•—t‰|«tj|j|j«|_tj|j«|_yr]) rhrirrærmÚdensersrtrurÒs €r-rizFlavaSelfOutput.__init__ùsBø€Ü ‰ÑÔÜ—Y‘Y˜v×1Ñ1°6×3EÑ3EÓFˆŒ Ü—z‘z &×"<Ñ"<Ó=ˆr4rìÚinput_tensorcóJ—|j|«}|j|«}|Sr]©rru©r,rìrs r-r zFlavaSelfOutput.forwardþs$€ØŸ ™ =Ó1ˆ ØŸ™ ]Ó3ˆ àÐr4) r5r6r7r8rrir9r£r r¦r§s@r-rrósEø„ñð >Ð3ð>¸õ>ð U§\¡\ðÀÇÁðÐRW×R^ÑR^÷r4rcó‡—eZdZdeddfˆfd„Zdeeddfd„Z ddejde ejd e ejd edee ejejfe ejff d„ZˆxZS) ÚFlavaAttentionrar$Ncó€•—t‰|«t|«|_t |«|_t «|_yr])rhrirÝÚ attentionrÚoutputÚsetÚpruned_headsrÒs €r-rizFlavaAttention.__init__s0ø€Ü ‰ÑÔÜ+¨FÓ3ˆŒÜ% fÓ-ˆŒÜ›EˆÕr4Úheadscó>—t|«dk(ryt||jj|jj|j «\}}t |jj|«|j_t |jj|«|j_t |jj|«|j_ t |jj|d¬«|j_|jjt|«z |j_|jj|jjz|j_|j j|«|_y)Nrrr)Úlenrrrãrärrrèrérêr rråÚunion)r,rÚindexs r-Úprune_headszFlavaAttention.prune_headss€Üˆu‹:˜Š?ØÜ7Ø4—>‘>×5Ñ5°t·~±~×7YÑ7YÐ[_×[lÑ[ló ‰ˆˆuô 2°$·.±.×2FÑ2FÈÓNˆ‰ÔÜ/°·±×0BÑ0BÀEÓJˆ‰ÔÜ1°$·.±.×2FÑ2FÈÓNˆ‰ÔÜ.¨t¯{©{×/@Ñ/@À%ÈQÔOˆ‰Ôð.2¯^©^×-OÑ-OÔRUÐV[ÓR\Ñ-\ˆ‰Ô*Ø'+§~¡~×'IÑ'IÈDÏNÉN×LnÑLnÑ'nˆ‰Ô$Ø ×-Ñ-×3Ñ3°EÓ:ˆÕr4rìrírîrïcól—|j||||¬«}|j|d|«}|f|ddz}|S©N)rírîrïrr)rr )r,rìrírîrïÚself_outputsÚattention_outputrþs r-r zFlavaAttention.forwardsQ€ð—~‘~Ø¨.ÀIÐarð&ó ˆð Ÿ;™; |°A¡¸ ÓFÐà#Ð%¨°Q°RÐ(8Ñ8ˆØˆr4rÿ)r5r6r7rrirr¤rr9r£rrIrr1r r¦r§s@r-r r s©ø„ð"Ð3ð"¸õ"ð; S¡ð;¨dó;ð*26Ø,0Ø"'ñà—|‘|ðð! §¡Ñ.ðð˜EŸL™LÑ)ð ð ðð ˆuU—\‘\ 5§<¡<Ð/Ñ0°%¸¿¹Ñ2EÐEÑ F÷ r4r có`‡—eZdZdeddfˆfd„Zdejdejfd„ZˆxZS)ÚFlavaIntermediaterar$Ncó•—t‰|«tj|j|j «|_t|jt«rt|j|_y|j|_yr])rhrirrærmÚintermediate_sizerr¬Ú hidden_actÚstrr Úintermediate_act_fnrÒs €r-rizFlavaIntermediate.__init__0s]ø€Ü ‰ÑÔÜ—Y‘Y˜v×1Ñ1°6×3KÑ3KÓLˆŒ Üf×'Ñ'¬Ô-Ü'-¨f×.?Ñ.?Ñ'@ˆDÕ$à'-×'8Ñ'8ˆDÕ$r4rìcóJ—|j|«}|j|«}|Sr])rr ©r,rìs r-r zFlavaIntermediate.forward9s&€ØŸ ™ =Ó1ˆ Ø×0Ñ0°Ó?ˆ àÐr4© r5r6r7rrir9r£r r¦r§s@r-rr/s2ø„ð9Ð3ð9¸õ9ð U§\¡\ð°e·l±l÷r4rcóx‡—eZdZdeddfˆfd„Zdejdejdejfd„ZˆxZS)ÚFlavaOutputrar$NcóÈ•—t‰|«tj|j|j «|_tj|j«|_ yr]) rhrirrærrmrrsrtrurÒs €r-rizFlavaOutput.__init__AsBø€Ü ‰ÑÔÜ—Y‘Y˜v×7Ñ7¸×9KÑ9KÓLˆŒ Ü—z‘z &×"<Ñ"<Ó=ˆr4rìrcóT—|j|«}|j|«}||z}|Sr]rrs r-r zFlavaOutput.forwardGs.€ØŸ ™ =Ó1ˆ ØŸ™ ]Ó3ˆ à%¨Ñ4ˆ àÐr4r#r§s@r-r%r%@s@ø„ð>Ð3ð>¸õ>ð U§\¡\ðÀÇÁðÐRW×R^ÑR^÷r4r%cóî‡—eZdZdZdeddfˆfd„Z ddejdeejdeejd e de eejejfeejff d „ZˆxZ S)Ú FlavaLayerz?This corresponds to the Block class in the timm implementation.rar$Ncór•—t‰|«|j|_d|_t |«|_t |«|_t|«|_ tj|j|j¬«|_tj|j|j¬«|_y)Nrr¾)rhriÚchunk_size_feed_forwardÚseq_len_dimr rrÚintermediater%r rrÍrmrÎÚlayernorm_beforeÚlayernorm_afterrÒs €r-rizFlavaLayer.__init__Ss‰ø€Ü ‰ÑÔØ'-×'EÑ'EˆÔ$ØˆÔÜ'¨Ó/ˆŒÜ-¨fÓ5ˆÔÜ! &Ó)ˆŒô!#§¡¨V×-?Ñ-?ÀV×EZÑEZÔ [ˆÔÜ!Ÿ|™|¨F×,>Ñ,>ÀF×DYÑDYÔZˆÕr4rìrírîrïcóà—|j|j|«|||¬«}|d}|dd}||z}|j|«}|j|«}|j ||«}|f|z}|Sr)rr.r/r-r ) r,rìrírîrïÚself_attention_outputsrrþÚlayer_outputs r-r zFlavaLayer.forward_s™€ð"&§¡Ø×!Ñ! -Ó0Ø)ØØ/ð "0ó" Ðð2°!Ñ4ÐØ(¨¨Ð,ˆð)¨=Ñ8ˆ ð×+Ñ+¨MÓ:ˆØ×(Ñ(¨Ó6ˆð—{‘{ <°Ó?ˆà/ GÑ+ˆàˆr4rÿ)r5r6r7r8rrir9r£rrIrr1r r¦r§s@r-r)r)Ps˜ø„ÙIð [Ð3ð [¸õ [ð26Ø,0Ø"'ñà—|‘|ðð! §¡Ñ.ðð˜EŸL™LÑ)ð ð ðð ˆuU—\‘\ 5§<¡<Ð/Ñ0°%¸¿¹Ñ2EÐEÑ F÷ r4r)cóª‡—eZdZdeddfˆfd„Z ddejdeejdeejded ed ede e effd„ZˆxZ S) ÚFlavaEncoderrar$NcóÐ•—t‰|«||_tjt|j«Dcgc] }t|«‘Œc}«|_d|_ ycc}wr¢) rhrirarÚ ModuleListÚrangeÚnum_hidden_layersr)ÚlayerÚgradient_checkpointing)r,rarœrvs €r-rizFlavaEncoder.__init__sNø€Ü ‰ÑÔØˆŒÜ—]‘]ÄÀf×F^ÑF^Ó@_Ö#`¸1¤J¨vÕ$6Ò#`ÓaˆŒ Ø&+ˆÕ#ùò$as½A#rìrírîrïÚoutput_hidden_statesÚreturn_dictcó—|rdnd}|rdnd}t|j«D]2\} } |r||fz}||| nd}| ||||«}|d}|sŒ*||dfz}Œ4|r||fz}|std„|||fD««St|||¬«S)Nr<rrc3ó&K—|] }|€Œ|–—Œywr]r<)r*rHs r-r.z'FlavaEncoder.forward..¢sèø€Òm˜qÐ_`Ñ_lœÑmùó‚Š)Úlast_hidden_staterìÚ attentions)Ú enumerater9r1r) r,rìrírîrïr;r<Úall_hidden_statesÚall_self_attentionsÚiÚlayer_moduleÚlayer_head_maskÚ layer_outputss r-r zFlavaEncoder.forward…sÎ€ñ#7™B¸DÐÙ$5™b¸4Ðä(¨¯©Ó4ò P‰OˆAˆ|Ù#Ø$5¸Ð8HÑ$HÐ!à.7Ð.C˜i¨šlÈˆOá(¨¸ÈÐYjÓkˆMà)¨!Ñ,ˆMâ Ø&9¸]È1Ñ=MÐð.ä ˜¤§¡Ô -ØK‰K×Ñ×"Ñ"Ô$ØM‰M×Ñ×$Ñ$ SÕ)Ü ˜Ô 9Ô :ØK‰K×Ñ×"Ñ"Õ$Ü ˜Ô 4Ô 5Ø×Ñ×!Ñ!×'Ñ'Ô)Ø×&Ñ&×+Ñ+×1Ñ1Ô3Ø× Ñ Ð,Ø×!Ñ!×&Ñ&×,Ñ,Õ.ð-ä ˜Ô 4Ô 5Ø×#Ò#Ø× Ñ ×%Ñ%×+Ñ+Õ-ð$ä ˜¤ Ô +Ø×Ñ×#Ñ#×)Ñ)¨$¯+©+×*LÑ*LÕMð,r4) r5r6r7rr;Úbase_model_prefixÚsupports_gradient_checkpointingrrrær°rÍrer<r4r-rSrS·sC…àÓØÐØ&*Ð#ðN E¨"¯)©)°R·Y±YÀÇÁÐ*LÑ$MðNÐRVôNr4rScóp‡—eZdZUeed<dZdZddedefˆfd„ Zde jfd„Zde jfd „Zd e eeefddfd„Ze ddeej(d eej*deedeej(deej(deedeedeedeeeffd„«ZˆxZS)ÚFlavaImageModelrazflava.image_modelr“Úadd_pooling_layercó•—t‰||«||_t|«|_t|«|_tj|j|j¬«|_|rt|«nd|_ |j«y©úv add_pooling_layer (bool, *optional*, defaults to `True`): Whether to add a pooling layer r¾N)rhrirar`rwr4ÚencoderrrÍrmrÎÚ layernormrJÚpoolerÚ post_init©r,rarjrvs €r-rizFlavaImageModel.__init__ásiø€ô ‰Ñ˜Ô àˆŒä.¨vÓ6ˆŒÜ# FÓ+ˆŒäŸ™ f×&8Ñ&8¸f×>SÑ>SÔTˆŒÙ->”k &Ô)ÀDˆŒà‰Õr4r$có.—|jjSr]©rwrpr3s r-Úget_input_embeddingsz$FlavaImageModel.get_input_embeddingsòs€Ø‰×/Ñ/Ð/r4rêcó&—||j_yr]rt©r,rês r-Úset_input_embeddingsz$FlavaImageModel.set_input_embeddingsõs€Ø+0ˆ‰Õ(r4Úheads_to_pruneNcó˜—|j«D]7\}}|jj|jj |«Œ9y©z Prunes heads of the model. heads_to_prune: dict of {layer_num: list of heads to prune in this layer} See base class PreTrainedModel N©Úitemsrnr9rr©r,ryr9rs r-Ú_prune_headszFlavaImageModel._prune_headsøóE€ð +×0Ñ0Ó2ò C‰LˆE5ØL‰L×Ñ˜uÑ%×/Ñ/×;Ñ;¸EÕBñ Cr4r”r’rírîrïr;r<c ó"—||n|jj}||n|jj}||n|jj}|€t d«‚|j||jj«}|j|||¬«} |j| |||||¬«} | d}|j|«}|j|j|«nd}|s ||f| ddzSt||| j| j¬«S)zÅ bool_masked_pos (`torch.BoolTensor` of shape `(batch_size, image_num_patches)`): Boolean masked positions. Indicates which patches are masked (1) and which aren't (0). Nz You have to specify pixel_values)r”r’©rírîrïr;r<rr©r@Ú pooler_outputrìrA)rarïr;Úuse_return_dictr´Ú get_head_maskr8rwrnrorpr rìrA) r,r“r”r’rírîrïr;r<Úembedding_outputÚencoder_outputsÚsequence_outputrQs r-r zFlavaImageModel.forwards8€ð 2CÐ1NÑ-ÐTX×T_ÑT_×TqÑTqÐà$8Ð$DÑ È$Ï+É+×JjÑJjð ð&1Ð%<‘kÀ$Ç+Á+×B]ÑB]ˆàÐÜÐ?Ó@Ð@ð×&Ñ& y°$·+±+×2OÑ2OÓPˆ àŸ?™?Ø¨/ÐTlð+ó ÐðŸ,™,ØØ)ØØ/Ø!5Ø#ð 'ó ˆð*¨!Ñ,ˆØŸ.™.¨Ó9ˆØ8<¿¹Ð8O˜Ÿ™ OÔ4ÐUYˆ áØ# ]Ð3°oÀaÀbÐ6IÑIÐIä)Ø-Ø'Ø)×7Ñ7Ø&×1Ñ1ô ð r4©T©NNNNNNNN)r5r6r7rr;rfÚmain_input_namerIrirÚModulerurxÚdictr¤Úlistrrrr9r£r¥rr1r r r¦r§s@r-ririÚs2ø…àÓà+ÐØ$€OñÐ/ðÀDõð"0 b§i¡ió0ð1¨"¯)©)ó1ðC¨4°°T¸#±Y°Ñ+?ðCÀDóCðð04Ø6:Ø37Ø15Ø,0Ø,0Ø/3Ø&*ñ7 à˜uŸ|™|Ñ,ð7 ð" %×"2Ñ"2Ñ3ð7 ð#+¨4¡.ð 7 ð ! §¡Ñ.ð7 ð˜EŸL™LÑ)ð 7 ð$ D™>ð7 ð' t™nð7 ð˜d‘^ð7 ð ˆuÐ0Ð0Ñ 1ò7 óô7 r4ricól‡—eZdZUeed<dZddedefˆfd„ Zdefd„Z de jfd„Zd e eeefdd fd„Ze ddeej(d eej(deej(deej(deej(deedeedeedeeeffd„«ZˆxZS)ÚFlavaTextModelrazflava.text_modelrjcó•—t‰||«||_t|«|_t|«|_tj|j|j¬«|_|rt|«nd|_ |j«yrl)rhrirar»rwr4rnrrÍrmrÎrorJrprqrrs €r-rizFlavaTextModel.__init__Asiø€ô ‰Ñ˜Ô ØˆŒä-¨fÓ5ˆŒÜ# FÓ+ˆŒäŸ™ f×&8Ñ&8¸f×>SÑ>SÔTˆŒÙ->”k &Ô)ÀDˆŒà‰Õr4r$có.—|jjSr]©rwrÉr3s r-ruz#FlavaTextModel.get_input_embeddingsQs€Ø‰×.Ñ.Ð.r4rêcó&—||j_yr]r”rws r-rxz#FlavaTextModel.set_input_embeddingsTs€Ø*/ˆ‰Õ'r4ryNcó˜—|j«D]7\}}|jj|jj |«Œ9yr{r|r~s r-rzFlavaTextModel._prune_headsWr€r4rÓrírÄrÂrîrïr;r<c óÂ—||n|jj}||n|jj}||n|jj}|€t d«‚|j«} |€!t j| |j¬«}|j||jj«}|j|| |j«} |j|||¬«}|j|| ||||¬«}|d} |j| «} |j|j| «nd}|s | |f|ddzSt!| ||j"|j$¬«S) aù input_ids (`torch.LongTensor` of shape `(batch_size, text_seq_length)`): Indices of input sequence tokens in the vocabulary. Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and [`PreTrainedTokenizer.__call__`] for details. [What are input IDs?](../glossary#input-ids) token_type_ids (`torch.LongTensor` of shape `(batch_size, text_seq_length)`, *optional*): Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`: - 0 corresponds to a *sentence A* token, - 1 corresponds to a *sentence B* token. [What are token type IDs?](../glossary#token-type-ids) NzYou have to specify input_ids©rÕ)rÓrÄrÂr‚rrrƒ)rarïr;r…r´r~r9ÚonesrÕr†r8Úget_extended_attention_maskrwrnrorpr rìrA)r,rÓrírÄrÂrîrïr;r<r×Úextended_attention_maskr‡rˆr‰rQs r-r zFlavaTextModel.forward_s€ð02CÐ1NÑ-ÐTX×T_ÑT_×TqÑTqÐà$8Ð$DÑ È$Ï+É+×JjÑJjð ð&1Ð%<‘kÀ$Ç+Á+×B]ÑB]ˆàÐÜÐ<Ó=Ð=à—n‘nÓ&ˆàÐ!Ü"ŸZ™Z¨¸I×ðI ð' t™nðI ð˜d‘^ðI ð ˆuÐ0Ð0Ñ 1òI óôI r4r‘cóø‡—eZdZUeed<dZdZddefˆfd„ Zdee e e fddfd„Ze dde jd ee jd ee jdeedeed eedeeeffd„«ZˆxZS)r`razflava.multimodal_modelrìcóº•—t‰||«||_|jj|_|jr9t j t jdd|j««|_ t|«|_t j|j|j¬«|_|rt|«nd|_|j#«y)rmrr¾N)rhrirararrkr9rlrmrnr4rnrÍrÎrorJrprqrrs €r-rizFlavaMultimodalModel.__init__³s›ø€ô ‰Ñ˜Ô ØˆŒØ!Ÿ[™[×6Ñ6ˆÔØ×ÒÜŸ\™\¬%¯+©+°a¸¸F×SÑ>SÔTˆŒÙ->”k &Ô)ÀDˆŒà‰Õr4ryr$Ncó˜—|j«D]7\}}|jj|jj |«Œ9yr{r|r~s r-rz!FlavaMultimodalModel._prune_headsÅr€r4rírîrïr;r<có—||n|jj}||n|jj}||n|jj}|j «\}}} |j r;|jj|dd«} tj| |fd¬«}|dz }|€#tj||f|j¬«}|j||jj«}|j|||f|j«}|j||||||¬«}|d} |j!| «} |j"|j#| «nd}|s | |f|ddzSt%| ||j&|j(¬«S) z¾ hidden_states (`torch.FloatTensor` of shape `(batch_size, image_num_patches + text_seq_len, hidden_size)`): The concatenated hidden states of unimodal encoders. Nr{rrr˜r‚rrƒ)rarïr;r…r~rarnr—r9r‹r™rÕr†r8ršrnrorpr rìrA)r,rìrírîrïr;r<ršrØrœrŸr›rˆr‰rQs r-r zFlavaMultimodalModel.forwardÍs¢€ð2CÐ1NÑ-ÐTX×T_ÑT_×TqÑTqÐà$8Ð$DÑ È$Ï+É+×JjÑJjð ð&1Ð%<‘kÀ$Ç+Á+×B]ÑB]ˆà$1×$6Ñ$6Ó$8Ñ!ˆ J à×ÒØŸ™×.Ñ.¨z¸2¸rÓBˆJÜ!ŸI™I z°=Ð&AÀqÔIˆMØ˜!‰OˆJàÐ!Ü"ŸZ™Z¨°ZÐ(@È×I]ÑI]Ô^ˆNð×&Ñ& y°$·+±+×2OÑ2OÓPˆ Ø04×0PÑ0PØ˜Z¨Ð4°m×6JÑ6Jó1 ÐðŸ,™,ØØ2ØØ/Ø!5Ø#ð 'ó ˆð*¨!Ñ,ˆØŸ.™.¨Ó9ˆØ8<¿¹Ð8O˜Ÿ™ OÔ4ÐUYˆ áØ# ]Ð3°oÀaÀbÐ6IÑIÐIä)Ø-Ø'Ø)×7Ñ7Ø&×1Ñ1ô ð r4rŠ)NNNNN)r5r6r7rr;rfrŒrirŽr¤rrrr9r£rrIrr1r r r¦r§s@r-r`r`¬sÚø…à!Ó!à0ÐØ%€OñÐ4õð$C¨4°°T¸#±Y°Ñ+?ðCÀDóCðð26Ø,0Ø,0Ø/3Ø&*ñ; à—|‘|ð; ð! §¡Ñ.ð; ð˜EŸL™LÑ)ð ; ð $ D™>ð; ð' t™nð ; ð˜d‘^ð; ð ˆuÐ0Ð0Ñ 1ò; óô; r4r`có‡—eZdZUeed<defˆfd„Ze ddeejdeejdeejdeejdee dee d ee d ejfd„«Ze ddeejd eejdee deejdeejdee dee d ee d ejfd„«Ze ddeejdeejdeejdeejd eejdeejdeejdee dee de d ee d eeeffd„«ZˆxZS)rbracóÜ•—t‰||«t|jt«s"tdt |j«›d«‚t|jt«s"tdt |j«›d«‚t|jt«s%tddt |j«›dz«‚|j}|j}|j}|j|_|j|_ |j|_|j|_t!|«|_t%|«|_t)|«|_t-j.|j|j«|_t-j.|j|j«|_t-j4t7j8|j:j<««|_t-j.|j|j«|_ t-j.|j|j«|_!|jE«y)NzLconfig.text_config is expected to be of type FlavaTextConfig but is of type ràzNconfig.image_config is expected to be of type FlavaImageConfig but is of type zMconfig.multimodal_config is expected to be of type FlavaMultimodalConfig but zis of type )#rhrir¬Útext_configrÚ TypeErrorÚtypeÚimage_configrÚmultimodal_configrÚprojection_dimrmÚtext_hidden_sizeÚimage_hidden_sizeÚmm_hidden_sizer‘Ú text_modelriÚimage_modelr`Úmultimodal_modelrræÚimage_projectionÚtext_projectionrkr9ÚtensorrardrcÚimage_to_mm_projectionÚtext_to_mm_projectionrq)r,rar¢r¥r¦rvs €r-rizFlavaModel.__init__sõø€Ü ‰Ñ˜Ô ä˜&×,Ñ,¬oÔ>ÜðÜ˜×+Ñ+Ó,Ð-¨Qð0óð ô ˜&×-Ñ-Ô/?Ô@ÜðÜ˜×,Ñ,Ó-Ð.¨að1óð ô ˜&×2Ñ2Ô4IÔJÜØ_Ø¤ V×%=Ñ%=Ó >Ð?¸qÐAñBóð ð ×(Ñ(ˆØ×*Ñ*ˆØ"×4Ñ4Ðà$×3Ñ3ˆÔØ +× 7Ñ 7ˆÔØ!-×!9Ñ!9ˆÔØ/×;Ñ;ˆÔä(¨Ó5ˆŒÜ*¨<Ó8ˆÔÜ 4Ð5FÓ GˆÔä "§ ¡ ¨$×*@Ñ*@À$×BUÑBUÓ VˆÔÜ!Ÿy™y¨×)>Ñ)>À×@SÑ@SÓTˆÔÜŸ<™<¬¯©°T·[±[×5WÑ5WÓ(XÓYˆÔä&(§i¡i°×0FÑ0FÈ×H[ÑH[Ó&\ˆÔ#Ü%'§Y¡Y¨t×/DÑ/DÀd×FYÑFYÓ%ZˆÔ"à‰Õr4rÓrírÄrÂrïr;r<r$c ób—|j|||||||¬«}|d} |j| «} | S)aŸ input_ids (`torch.LongTensor` of shape `(batch_size, text_seq_length)`): Indices of input sequence tokens in the vocabulary. Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and [`PreTrainedTokenizer.__call__`] for details. [What are input IDs?](../glossary#input-ids) token_type_ids (`torch.LongTensor` of shape `(batch_size, text_seq_length)`, *optional*): Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`: - 0 corresponds to a *sentence A* token, - 1 corresponds to a *sentence B* token. [What are token type IDs?](../glossary#token-type-ids) Returns: text_features (`torch.FloatTensor` of shape `(batch_size, output_dim`): The text embeddings obtained by applying the projection layer to the pooled output of [`FlavaTextModel`]. Examples: ```python >>> from transformers import AutoProcessor, FlavaModel >>> model = FlavaModel.from_pretrained("{0}") >>> processor = AutoProcessor.from_pretrained("{0}") >>> inputs = processor( ... text=["a photo of a cat", "a photo of a dog"], max_length=77, padding="max_length", return_tensors="pt" ... ) >>> text_features = model.get_text_features(**inputs) ``` )rÓrírÄrÂrïr;r<r)r«r¯)r,rÓrírÄrÂrïr;r<Útext_outputsrQÚ text_featuress r-Úget_text_featureszFlavaModel.get_text_features;sN€ðR—‘ØØ)Ø)Ø%Ø/Ø!5Ø#ð'ó ˆð% Q™ˆ Ø×,Ñ,¨]Ó;ˆ àÐr4r“r”r’rîc ód—|j||||||||¬«} | d} |j| «}|S)aÚ bool_masked_pos (`torch.BoolTensor` of shape `(batch_size, image_num_patches)`): Boolean masked positions. Indicates which patches are masked (1) and which aren't (0). Returns: image_features (`torch.FloatTensor` of shape `(batch_size, output_dim`): The image embeddings obtained by applying the projection layer to the pooled output of [`FlavaImageModel`]. Examples: ```python >>> from PIL import Image >>> import requests >>> from transformers import AutoProcessor, FlavaModel >>> model = FlavaModel.from_pretrained("{0}") >>> processor = AutoProcessor.from_pretrained("{0}") >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg" >>> image = Image.open(requests.get(url, stream=True).raw) >>> inputs = processor(images=image, return_tensors="pt") >>> image_features = model.get_image_features(**inputs) ``` )r“r”rírîrïr;r’r<r)r¬r®)r,r“r”r’rírîrïr;r<Ú image_outputsrQÚimage_featuress r-Úget_image_featureszFlavaModel.get_image_featuresssT€ðL×(Ñ(Ø%Ø+Ø)ØØ/Ø!5Ø%=Ø#ð)ó ˆ ð& aÑ(ˆ Ø×.Ñ.¨}Ó=ˆàÐr4Úimage_attention_maskÚskip_multimodal_encoderc óÒ—||n|jj}| std«‚d}d} d}d}|5|j|||| | |¬«}|d|d} }|j | d«}d}d}d}d}|6|j||||| | |¬«}|d|d}}|j |d«}d}d}|¡|Ÿ|s|g|j\}}}|jjr|dz }tj|||j¬ «}tj||gd¬ «}nd}tj||gd¬ «}|j|||¬«}|d}|s||||||fSt||||||¬«S) aÞ input_ids (`torch.LongTensor` of shape `(batch_size, image_num_patches + text_seq_len)`): Indices of input sequence tokens in the vocabulary. Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and [`PreTrainedTokenizer.__call__`] for details. [What are input IDs?](../glossary#input-ids) token_type_ids (`torch.LongTensor` of shape `(batch_size, image_num_patches + text_seq_len)`, *optional*): Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`: - 0 corresponds to a *sentence A* token, - 1 corresponds to a *sentence B* token. [What are token type IDs?](../glossary#token-type-ids) bool_masked_pos (`torch.BoolTensor` of shape `(batch_size, image_num_patches)`): Boolean masked positions. Indicates which patches are masked (1) and which aren't (0). image_attention_mask (`torch.Tensor` of shape `(batch_size, image_num_patches)`, *optional*): Mask to avoid performing attention on padding pixel values for image inputs. Mask values selected in `[0, 1]`: - 1 for pixel values that are real (i.e., **not masked**), - 0 for pixel values that are padding (i.e., **masked**). skip_multimodal_encoder (*bool*, *optional*): Skip any calculations for multimodal encoder. Useful if multimodal encoding is not going to be used. Examples: ```python >>> from PIL import Image >>> import requests >>> from transformers import AutoProcessor, FlavaModel >>> model = FlavaModel.from_pretrained("facebook/flava-full") >>> processor = AutoProcessor.from_pretrained("facebook/flava-full") >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg" >>> image = Image.open(requests.get(url, stream=True).raw) >>> inputs = processor(text=["a photo of a cat"], images=image, return_tensors="pt", padding=True) >>> outputs = model(**inputs) >>> image_embeddings = outputs.image_embeddings >>> text_embeddings = outputs.text_embeddings >>> multimodal_embeddings = outputs.multimodal_embeddings >>> outputs.image_embeddings.shape torch.Size([1, 197, 768]) >>> text_embeddings.shape torch.Size([1, 7, 768]) >>> multimodal_embeddings.shape torch.Size([1, 205, 768]) ``` NzRFLAVA model requires hidden states to work. Please set `output_hidden_states=True`)r“r”rírïr;r<rr|r{)rÓrírÂrÄrïr;r<rr˜r)rír<)rrr r!r"r#)rar<r´r¬r±r«r²rƒrrar9r™rÕr‹r)r,rÓr“rírÄr”rÂr»r¼rïr;r<rÚimage_statesÚimage_mm_projectionrr Útext_statesÚtext_mm_projectionr!r"r#ršr›rœÚattention_mask_imageÚattention_multimodalÚmultimodal_inputs r-r zFlavaModel.forward©s €ðF&1Ð%<‘kÀ$Ç+Á+×BYÑBYˆÙ#ÜÐqÓrÐrØÐØˆØ"ÐØˆØÐ#Ø×+Ñ+Ø)Ø /Ø3Ø"3Ø%9Ø'ð ,óˆLð.:¸!©_¸lÈ1¹o˜lÐà"&×"=Ñ"=¸lÈ2Ñ>NÓ"OÐàˆØˆØ!ÐØˆØÐ ØŸ/™/Ø#Ø-Ø)Ø-Ø"3Ø%9Ø'ð*óˆKð,7°q©>¸;Àq¹>˜[ˆOà!%×!;Ñ!;¸KÈ¹OÓ!LÐà $ÐØ ÐØÐ*Ð/AÐ/MÑVmØÐ)Ø)<×)BÑ)BÑ& ˜G QØ×(Ñ(×6Ò6Ø˜q‘LGÜ',§z¡z°*¸gÐNa×NhÑNhÔ'iÐ$Ü',§y¡yÐ2FÈÐ1WÐ]^Ô'_Ñ$à'+Ð$Ü$Ÿy™yÐ*=Ð?QÐ)RÐXYÔZÐØ $× 5Ñ 5Ø Ð1EÐS^ð!6ó!Ðð%6°aÑ$8Ð!áà ØØØØ%Ø!ð ð ô Ø-Ø%Ø+Ø#Ø"7Ø/ô ð r4)NNNNNNNr‹)NNNNNNNNNTN)r5r6r7rr;rirrr9r£rIr:r¶r¥rºÚ LongTensorrr1r%r r¦r§s@r-rbrbsŸø…àÓð)˜{õ)ðVð-1Ø15Ø15Ø/3Ø,0Ø/3Ø&*ñ5à˜EŸL™LÑ)ð5ð! §¡Ñ.ð5ð! §¡Ñ.ð 5ð ˜uŸ|™|Ñ,ð5ð$ D™>ð 5ð' t™nð5ð˜d‘^ð5ð × Ñ ò5óð5ðnð04Ø6:Ø37Ø15Ø,0Ø,0Ø/3Ø&*ñ3à˜uŸ|™|Ñ,ð3ð" %×"2Ñ"2Ñ3ð3ð#+¨4¡.ð 3ð ! §¡Ñ.ð3ð˜EŸL™LÑ)ð 3ð$ D™>ð3ð' t™nð3ð˜d‘^ð3ð × Ñ ò3óð3ðjð15Ø48Ø15Ø15Ø26Ø37Ø7;Ø26Ø,0Ø%)Ø&*ñK à˜E×,Ñ,Ñ-ðK ð˜u×0Ñ0Ñ1ðK ð! §¡Ñ.ð K ð ! §¡Ñ.ðK ð" %§,¡,Ñ/ð K ð˜u×/Ñ/Ñ0ðK ð' u§|¡|Ñ4ðK ð"*¨$¡ðK ð$ D™>ðK ð#ðK ð˜d‘^ðK ð ˆukÐ!Ñ "òK óôK r4rbcó`‡—eZdZdedefˆfd„Zdejdejfd„ZˆxZS)ÚFlavaImageCodebookResPathÚin_sizeÚout_sizecó•—t‰|«|dz}t«}tj«|d<tj ||dd¬«|d<tj«|d<tj ||dd¬«|d<tj«|d <tj ||dd¬«|d <tj«|d<tj ||dd¬«|d <tj|«|_y)NéÚrelu_1r r©rªÚpaddingÚconv_1Úrelu_2Úconv_2Úrelu_3Úconv_3Úrelu_4rÚconv_4)rhrirrÚReLUr°Ú SequentialÚpath)r,rÈrÉÚkwargsÚhid_sizerØrvs €r-riz"FlavaImageCodebookResPath.__init__9sÊø€Ü ‰ÑÔØ˜q‘=ˆä‹}ˆÜŸ™›ˆˆX‰ÜŸ™ 7¨HÀ!ÈQÔOˆˆX‰ÜŸ™›ˆˆX‰ÜŸ™ 8¨XÀ1ÈaÔPˆˆX‰ÜŸ™›ˆˆX‰ÜŸ™ 8¨XÀ1ÈaÔPˆˆX‰ÜŸ™›ˆˆX‰ÜŸ™ 8¨XÀ1ÈaÔPˆˆX‰ä—M‘M $Ó'ˆ r4r·r$có$—|j|«Sr])rØ©r,r·s r-r z!FlavaImageCodebookResPath.forwardIs€Øy‰y˜‹|Ðr4© r5r6r7r¤rir9r£r r¦r§s@r-rÇrÇ8s1ø„ð( ð(¨sõ(ð ˜Ÿ™ð¨%¯,©,÷r4rÇcód‡—eZdZdededefˆfd„Zdejdejfd„ZˆxZS)ÚFlavaImageCodebookBlockrÈrÉÚ num_layerscóØ•—t‰|«d|dzz|_||k7rtj||dd¬«|_ntj«|_t||«|_y)Nrr|rrÍ) rhriÚ post_gainrr°Úid_pathÚIdentityrÇÚres_path)r,rÈrÉràrÙrvs €r-riz FlavaImageCodebookBlock.__init__NsWø€Ü ‰ÑÔà˜j¨!™mÑ,ˆŒàhÒÜŸ9™9 W¨hÀAÈqÔQˆDLäŸ;™;›=ˆDŒLä1°'¸8ÓDˆ r4r·r$cób—|j|«|j|j|«zzSr])rãrârårÜs r-r zFlavaImageCodebookBlock.forwardZs'€Ø|‰|˜A‹ §¡°$·-±-ÀÓ2BÑ!BÑBÐBr4rÝr§s@r-rßrßMs?ø„ð E ð E¨sð EÀõ EðC˜Ÿ™ðC¨%¯,©,÷Cr4rßcón‡—eZdZd dededededef ˆfd„ Zdejdejfd „ZˆxZ S)ÚFlavaImageCodebookLayerGroupÚ num_blocksràrÈrÉÚuse_poolcó$•—t‰|«t«}t|«D]4}|dk(rt |||«|d|dz›<Œt |||«|d|dz›<Œ6|rtjd¬«|d<tj|«|_y)NrÚblock_rr|)rªÚpool) rhrirr7rßrÚ MaxPool2dr×Úgroup) r,réràrÈrÉrêÚblocksrErvs €r-riz%FlavaImageCodebookLayerGroup.__init___s—ø€Ü ‰ÑÔÜ“ˆÜzÓ"ò cˆAØAŠvÜ+BÀ7ÈHÐV`Ó+a˜ A¡˜wÐ'Ò(ä+BÀ8ÈXÐWaÓ+b˜ A¡˜wÐ'Ò(ð cñÜŸ\™\°aÔ8ˆF6‰Nä—]‘] 6Ó*ˆ r4r·r$có$—|j|«Sr])rïrÜs r-r z$FlavaImageCodebookLayerGroup.forwardms€Øz‰z˜!‹}Ðr4rŠ) r5r6r7r¤rIrir9r£r r¦r§s@r-rèrè^sHø„ñ+ 3ð+°Cð+À#ð+ÐQTð+Ð`dõ+ð˜Ÿ™ð¨%¯,©,÷r4rèa" The FLAVA's image codebook model inspired from DALL-E's original encoder. Outputs raw hidden states and can be used to generate image tokens for an image based on DALL-E's vocab. Used to generate labels for MIM. Use `get_codebook_indices` to get image tokens for an image. cóè‡—eZdZUdZeed<dZdZdedefˆfd„Z de jde jfd„Zde jde jfd „Z de jde jfd „ZˆxZS)ÚFlavaImageCodebookÚrar“FrÙcóÆ•—t‰||«||_|j|_|j|_|j |_|j|_|j|_|j|j z}t«}tj«|d<tjd|jz|jdd¬«|d<t«}tj|jd|jzdd¬«|d <t|j |d|jzd|jz«|d <t|j |d|jzd|jz«|d<t|j |d|jzd |jz«|d<t|j |d |jzd|jzd¬«|d<tj|«|d<tj|«|_|j«|jj r|j#«D] }d|_Œyy)NÚreluérrrÍÚconvér ÚinputÚgroup_1r|Úgroup_2rËÚgroup_3F)rêÚgroup_4r )rhriraÚ num_groupsÚinput_channelsÚnum_blocks_per_grouprmrÇrrrÖr°rèr×rðrqÚfreezeÚ parametersÚ requires_grad)r,rarÙràÚ output_blocksrðÚparamrvs €r-rizFlavaImageCodebook.__init__sø€ô ‰Ñ˜Ô àˆŒØ ×+Ñ+ˆŒØ$×3Ñ3ˆÔØ$*×$?Ñ$?ˆÔ!Ø!×-Ñ-ˆÔØ ×+Ñ+ˆŒà—_‘_ t×'@Ñ'@Ñ@ˆ ä#› ˆ Ü "§¡£ ˆ fÑÜ "§ ¡ ¨!¨d×.>Ñ.>Ñ*>ÀÇÁÐ]^ÐhiÔ jˆ fÑä“ˆÜŸ)™) D×$7Ñ$7¸¸T×=MÑ=MÑ9MÐ[\ÐfgÔhˆˆw‰Ü8Ø×%Ñ% z°1°t×7GÑ7GÑ3GÈÈT×M]ÑM]ÑI]ó ˆˆyÑô9Ø×%Ñ% z°1°t×7GÑ7GÑ3GÈÈT×M]ÑM]ÑI]ó ˆˆyÑô9Ø×%Ñ% z°1°t×7GÑ7GÑ3GÈÈT×M]ÑM]ÑI]ó ˆˆyÑô9Ø×%Ñ% z°1°t×7GÑ7GÑ3GÈÈT×M]ÑM]ÑI]Ðhmô ˆˆyÑôŸ=™=¨Ó7ˆˆxÑä—m‘m FÓ+ˆŒà‰Ôà;‰;×ÒØŸ™Ó*ò ,Ø&+Õ#ñ ,ðr4r$cót—dt›dt›d|j|«}tj|d¬«S)Na) Args: pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`): Pixel values. Codebook pixel values can be obtained using [`AutoImageProcessor`] by passing `return_codebook_pixels=True`. See [`FlavaImageProcessor.__call__`] for details. Examples: ```python >>> from PIL import Image >>> import requests >>> from transformers import AutoImageProcessor, FlavaImageCodebook >>> model = FlavaImageCodebook.from_pretrained("úE") >>> image_processor = AutoImageProcessor.from_pretrained("aˆ") >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg" >>> image = Image.open(requests.get(url, stream=True).raw) >>> inputs = image_processor([image], return_codebook_pixels=True, return_tensors="pt") >>> inputs = dict(pixel_values=inputs.codebook_pixel_values) >>> outputs = model.get_codebook_indices(**inputs) ``` r)Úaxis)Ú_CHECKPOINT_FOR_CODEBOOK_DOCrðr9Úargmax©r,r“Úz_logitss r-Úget_codebook_indicesz'FlavaImageCodebook.get_codebook_indices«sI€ð 9ô:VÐ8VðWCÜC_ÐB`ð a ñ ð0—;‘;˜|Ó,ˆÜ|‰|˜H¨1Ô-Ð-r4có\—|j|«}tjd¬«|«S)Nrr)rðrÚSoftmaxrs r-Úget_codebook_probsz%FlavaImageCodebook.get_codebook_probsÇs&€Ø—;‘;˜|Ó,ˆØ Œrz‰z˜aÔ Ó*Ð*r4có0—dt›dt›dt|j«dk7rtd|j›d«‚|jd|jk7r(td|jd›d |j›«‚|j|«S) Na* Args: pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`): Pixel values. Codebook pixel values can be obtained using [`AutoImageProcessor`] by passing `return_codebook_pixels=True`. See [`FlavaImageProcessor.__call__`] for details. Examples: ```python >>> from PIL import Image >>> import requests >>> from transformers import AutoImageProcessor, FlavaImageCodebook >>> model = FlavaImageCodebook.from_pretrained("ra¥") >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg" >>> image = Image.open(requests.get(url, stream=True).raw) >>> inputs = image_processor([image], return_codebook_pixels=True, return_tensors="pt") >>> inputs = dict(pixel_values=inputs.codebook_pixel_values) >>> outputs = model(**inputs) >>> print(outputs.shape) (1, 196) ``` rËzinput shape z is not 4drz input has z channels but model built for )r rrƒr´rrð)r,r“s r-r zFlavaImageCodebook.forwardËs€ð 9ô:VÐ8VðWCÜC_ÐB`ða ñ ô6ˆ|×!Ñ!Ó" aÒ'Ü˜|¨L×,>Ñ,>Ð+?¸zÐJÓKÐKØ×Ñ˜aÑ D×$7Ñ$7Ò7Ü˜z¨,×*<Ñ*<¸QÑ*?Ð)@Ð@^Ð_c×_rÑ_rÐ^sÐtÓuÐuØ{‰{˜<Ó(Ð(r4)r5r6r7rfrr;rŒrgrrir9r£rrr:r r¦r§s@r-rórórs‡ø…ðÐØ$Ó$Ø$€OØ&+Ð#ð*,à(ð*,ðõ*,ðX.°·±ð.À%Ç,Á,ó.ð8+¨u¯|©|ð+ÀÇÁó+ð ) E×$5Ñ$5ð )¸%¿,¹,÷ )r4rócó$‡—eZdZˆfd„Zd„ZˆxZS)ÚFlavaPredictionHeadTransformcóh•—t‰|«tj|j|j«|_t |jt«rt|j|_ n|j|_ tj|j|j¬«|_y)Nr¾) rhrirrærmrr¬rrr Útransform_act_fnrÍrÎrÒs €r-riz%FlavaPredictionHeadTransform.__init__ïs{ø€Ü ‰ÑÔÜ—Y‘Y˜v×1Ñ1°6×3EÑ3EÓFˆŒ Üf×'Ñ'¬Ô-Ü$*¨6×+<Ñ+<Ñ$=ˆDÕ!à$*×$5Ñ$5ˆDÔ!ÜŸ™ f×&8Ñ&8¸f×>SÑ>SÔTˆr4cól—|j|«}|j|«}|j|«}|Sr])rrrÍr"s r-r z$FlavaPredictionHeadTransform.forwardøs4€ØŸ ™ =Ó1ˆ Ø×-Ñ-¨mÓ<ˆ ØŸ™ }Ó5ˆ ØÐr4©r5r6r7rir r¦r§s@r-rrîsø„ôUör4rcó,‡—eZdZdˆfd„ Zd„Zd„ZˆxZS)r_có|•—t‰|«||_t|«|_tj|j|jd¬«|_ tjtj|j««|_ |||j_|j|j_ y)NFrá)rhrirarÚ transformrrærmrÇÚdecoderrkr9rlrârY)r,rarYrvs €r-riz"FlavaMaskedPredictionHead.__init__s„ø€Ü ‰ÑÔØˆŒÜ5°fÓ=ˆŒÜ—y‘y ×!3Ñ!3°V×5FÑ5FÈUÔSˆŒÜ—L‘L¤§¡¨V×->Ñ->Ó!?Ó@ˆŒ ØÐØ"(ˆDL‰LÔð!ŸI™Iˆ‰Õr4có:—|j|j_yr])rârr3s r-Ú_tie_weightsz&FlavaMaskedPredictionHead._tie_weightss€Ø ŸI™Iˆ‰Õr4cóJ—|j|«}|j|«}|Sr])rrrÜs r-r z!FlavaMaskedPredictionHead.forwards"€ØN‰N˜1ÓˆØL‰L˜‹OˆØˆr4r])r5r6r7rirr r¦r§s@r-r_r_ÿsø„õ &ò&ör4r_có$‡—eZdZˆfd„Zd„ZˆxZS)ÚFlavaITMHeadcóš•—t‰|«||_t|«|_tj|jd«|_y)Nr|) rhrirarJrprrærmÚseq_relationshiprÒs €r-rizFlavaITMHead.__init__s:ø€Ü ‰ÑÔØˆŒÜ! &Ó)ˆŒÜ "§ ¡ ¨&×*<Ñ*<¸aÓ @ˆÕr4cóJ—|j|«}|j|«}|Sr])rpr#rÜs r-r zFlavaITMHead.forwards$€ØK‰K˜‹NˆØ×!Ñ! !Ó$ˆØˆr4rr§s@r-r!r!sø„ôAör4r!có$‡—eZdZˆfd„Zd„ZˆxZS)ÚFlavaGlobalContrastiveHeadcóR•—t‰|«||_|j|_yr])rhriraÚglobal_backprop_contrastiverÒs €r-riz#FlavaGlobalContrastiveHead.__init__#s#ø€Ü ‰ÑÔØˆŒØ+1×+MÑ+MˆÕ(r4có —tj|«}tjj«rtjj «s8tj |j d«|j¬«}|g}|g}n{|j d«}tjj«} |jrgtjjjj|«}tjjjj|«}n–t| «D cgc]} tj|«‘Œ}} t| «D cgc]} tj|«‘Œ}} tjj||«tjj||«|tjj«ztj ||j¬«z}tj |«}tj |«}tj"||j%dd««|z}tj"||j%dd««|z}|||fScc} wcc} w)Nrr˜r)r9ÚexpÚdistributedÚis_availableÚis_initializedrÐr~rÕÚget_world_sizer(rrˆÚ all_gatherr7Ú zeros_likeÚget_rankr‹ròr¶) r,rr rcÚtemperatureÚlabelsÚimage_embeddings_allÚtext_embeddings_allÚlocal_batch_sizeÚ world_sizerœÚlogits_per_imageÚlogits_per_texts r-r z"FlavaGlobalContrastiveHead.forward(s€Ü—i‘i Ó,ˆÜ× Ñ ×-Ñ-Ô/´u×7HÑ7H×7WÑ7WÔ7YÜ—\‘\Ð"2×"7Ñ"7¸Ó":ÐCS×CZÑCZÔ[ˆFØ$4Ð#5Ð Ø#2Ð"3Òà/×4Ñ4°QÓ7ÐÜ×*Ñ*×9Ñ9Ó;ˆJà×/Ò/ô(-×'8Ñ'8×';Ñ';×'FÑ'F×'QÑ'QÐRbÓ'cÐ$Ü&+×&7Ñ&7×&:Ñ&:×&EÑ&E×&PÑ&PÐQ`Ó&aÑ#äSXÐYcÓSdÖ'eÈa¬×(8Ñ(8¸Õ(IÐ'eÐ$Ð'eÜSXÐYcÓSdÖ&eÈa¤u×'7Ñ'7Ð8HÕ'IÐ&eÐ#Ð&eÜ×!Ñ!×,Ñ,Ð-AÐCSÔTÜ×!Ñ!×,Ñ,Ð-@À/ÔRà%¬×(9Ñ(9×(BÑ(BÓ(DÑDÄuÇ|Á|Ø Ð)9×)@Ñ)@ôHñˆFô %Ÿy™yÐ)=Ó>ÐÜ#Ÿi™iÐ(;Ó<Ðä Ÿ<™<Ð(8Ð:M×:WÑ:WÐXYÐ[\Ó:]Ó^ÐalÑlÐÜŸ,™, Ð8L×8VÑ8VÐWXÐZ[Ó8\Ó]Ð`kÑkˆà °&Ð8Ð8ùò(fùÚ&esÄ9JÅ$Jrr§s@r-r&r&"sø„ôNö 9r4r&zk The FLAVA model for pretraining which outputs losses, embeddings, logits and transformer outputs. c(ó^‡—eZdZgd¢Zddedeejfˆfd„ Zde jfd„Ze ddee jdee jd ee jd ee jdee jdee jd ee jdee jdee jdeedee jdee jdee jdeededeedeedeee jeff$d„«ZˆxZS)ÚFlavaForPreTraining)zmmm_text_head.decoder.biaszmmm_image_head.decoder.biaszmlm_head.decoder.biaszmim_head.decoder.biasraÚimage_codebookcób•—t‰||«t|«|_||_|j€&|j rt |j«|_t|j«|_ t|j«|_t|«|_t|j«|_t|j«|_t#|«|_|jj&|_|jj&|_|j,|_|j.|_|j0|_|j2|_|j4|_|j6|_|j8|_|j:|_|j=«y)zò image_codebook ([`nn.Module`]): If passed, the image codebook will be set to this. Otherwise, it will be initialized using the image_codebook_config defined in the config first as the first parameter. N)rhrirbrTr<Ú init_codebookróÚimage_codebook_configr_r¥Úmim_headr¢Úmlm_headr!Úitm_headÚmmm_image_headÚ mmm_text_headr&Úglobal_contrastive_headrÇÚimage_vocab_sizeÚtext_vocab_sizeÚ mlm_weightÚ mim_weightÚglobal_contrastive_weightÚce_ignore_indexÚ itm_weightÚmmm_image_weightÚmmm_text_weightÚ skip_unmasked_multimodal_encoderrq)r,rar<rvs €r-rizFlavaForPreTraining.__init__XsQø€ô ‰Ñ˜Ô Ü Ó'ˆŒ à,ˆÔØ×ÑÐ&¨6×+?Ò+?Ü"4°V×5QÑ5QÓ"RˆDÔô2°&×2EÑ2EÓFˆŒ Ü1°&×2DÑ2DÓEˆŒ Ü$ VÓ,ˆŒ Ü7¸×8KÑ8KÓLˆÔÜ6°v×7IÑ7IÓJˆÔÜ'AÀ&Ó'IˆÔ$à &× 3Ñ 3× >Ñ >ˆÔØ%×1Ñ1×<Ñ<ˆÔØ ×+Ñ+ˆŒØ ×+Ñ+ˆŒØ)/×)IÑ)IˆÔ&Ø%×5Ñ5ˆÔØ ×+Ñ+ˆŒØ &× 7Ñ 7ˆÔØ%×5Ñ5ˆÔØ06×0WÑ0WˆÔ-à‰Õr4r·cón—|j«dkDr!|j|jd«d«}|S)Nr|rr{)r‚rŠr~rÜs r-Ú _resize_to_2dz!FlavaForPreTraining._resize_to_2d{s,€Ø5‰5‹7QŠ;Ø—‘q—v‘v˜a“y "Ó%ˆAØˆr4rÓÚinput_ids_maskedr“Úcodebook_pixel_valuesrírÄr”rÂr»rOÚ mlm_labelsÚ mim_labelsÚ itm_labelsrïr;r<Úreturn_lossr$cóä—||n|jj}||n|jj}| | n|j} |€|tjd«|}|j |||||| | ||d¬« }|j ||||| |||d¬« }d}|j}|j}|j}|j}|j}dx}x}x}x}x}x}} dx}!x}"x}#}$dx}%x}&}'|€|C|€A|r?|j€td«‚|€td«‚|jj|«}|jdkDr||€|}(|ó|j|«}|j|«}|j ||j#d«<|(dd…|j%d «d…dd…f}(|j#|j «})||)}*|(|)dd…f}(|j'|(«}!|rjt(j*j-|!j/d |j0«|*j/d ««}||jz}n|j'|(«}!|j2dkDrÝ|Û|€Ù|}+|Ä|j|«}|+dd…|j%d «d…dd…f}+|j#|j «})||)},|+|)dd…f}+|j5|+«}"|rjt(j*j-|"j/d |j6«|,j/d ««}||j2z}n|j5|+«}"|j8dkDr¦|¤|j;|«}%| ‘| j#d«}-t=j>|-jA«|-|-jCdg««}|r/t(j*j-|%| «} | |j8z} |||}|||}| ||}||}||jDdkDr|}(|j%d «d z }.|(dd…dd|.z…dd…f}(|Õ|j|«}|j|«}|j ||j#d«<|j#|j «})||)}*|(|)dd…f}(|jG|(«}$|rjt(j*j-|$j/d |j0«|*j/d ««}||jDz}n|jG|(«}$|è|jHdkDrÙ|}+|+dd…|j%d «d…dd…f}+|¦|j|«}|j#|j «})||)},|+|)dd…f}+|jK|+«}#|rjt(j*j-|#j/d |j6«|,j/d ««}||jHz}n|jK|+«}#|l|i|jLdkDrY|jjO|dd…ddd…f«}/t(j*jQ|/d ¬«}/|jjS|dd…ddd…f«}0t(j*jQ|0d ¬«}0|jjTjVjYtZt\«|j_|0|/|jjT«\}&}'}1||&|}&|'|}'|1|}1|rWt(j*j-|&|1«}2t(j*j-|'|1«}3|2|3zdz}||jLz}ta||| |||¬ «}4|r0|4jc«s ted„|4jg«D««}|s.||jh|jhjk«nd||jl|jljk«nd|j|jn|jnjk«nd||jh|jhjk«nd||jl|jljk«nd||jn|jnjk«nd|!|"|%|&|&|$|#f}5|r|4jc«s||4f|5z}5tqd„|5D««Stsd%id|“d|4“d|“d|jh“d|“d|jl“d|j“d|jn“d|“d|jh“d|“d|jl“d|“d|jn“d|!“d|"“d |%“d!|&“d"|'“d#|$“d$|#“ŽS)&a— input_ids (`torch.LongTensor` of shape `(batch_size, text_seq_len)`): Indices of input sequence tokens in the vocabulary. Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and [`PreTrainedTokenizer.__call__`] for details. [What are input IDs?](../glossary#input-ids) input_ids_masked (`torch.LongTensor` of shape `(batch_size, text_seq_len)`): Indices of input sequence tokens in the vocabulary. These ones are the masked version of the original task to be used with MLM. Indices can be obtained using [`AutoTokenizer`] along with [`DataCollatorForMaskedLanguageModeling`]. See [`PreTrainedTokenizer.encode`] and [`PreTrainedTokenizer.__call__`] for details. [What are input IDs?](../glossary#input-ids) codebook_pixel_values (`torch.FloatTensor` of shape `(batch_size, num_image_patches, patch_size, patch_size, 3)`, *optional*): Pixel values for image patches that are used to compute the image codebook labels for masked image modeling. token_type_ids (`torch.LongTensor` of shape `(batch_size, text_seq_len)`, *optional*): Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`: - 0 corresponds to a *sentence A* token, - 1 corresponds to a *sentence B* token. [What are token type IDs?](../glossary#token-type-ids) bool_masked_pos (`torch.BoolTensor` of shape `(batch_size, image_num_patches)`): Boolean masked positions. Indicates which patches are masked (1) and which aren't (0). image_attention_mask (`torch.FloatTensor` of shape `(batch_size, image_num_patches)`, *optional*): Mask to avoid performing attention on padding token indices specifically for images. Mask values selected in `[0, 1]`: - 1 for tokens that are **not masked**, - 0 for tokens that are **masked**. [What are attention masks?](../glossary#attention-mask) skip_unmasked_multimodal_encoder (*bool*, *optional*): Skip any calculations for multimodal encoder for unmasked inputs. FLAVA pretraining doesn't need unmasked multimodal embeddings or outputs as of now. mlm_labels (`torch.LongTensor` of shape `(batch_size, text_seq_len)`, *optional*): Labels for computing the left-to-right language and multimodal masked modeling loss (next word prediction). Indices should be in `[-100, 0, ..., text_config.vocab_size - 1]` (see `input_ids` docstring). Tokens with indices set to `-100` are ignored (masked), the loss is only computed for the tokens with labels in `[0, ..., text_config.vocab_size - 1]`. mim_labels (`torch.LongTensor` of shape `(batch_size, image_num_patches)`, *optional*): Labels for computing the image and multimodal masked modeling loss. Indices should be in `[-100, 0, ..., image_config.vocab_size - 1]`. Tokens with indices set to `-100` are ignored (masked), the loss is only computed for the tokens with labels in `[0, ..., image_config.vocab_size - 1]`. If not passed, they are generated automatically using the image codebook assigned to the model. By default, it uses [`FlavaImageCodebook`]. See [`FlavaImageCodebook`] to understand how to generate mim_labels. itm_labels (`torch.LongTensor` of shape `(batch_size, 1)`, *optional*): Labels for computing the image-text matching loss. 0 means the pairs don't match and 1 means they match. The pairs with 0 will be skipped for calculation of MMM and global contrastive losses as well. return_loss (`bool`, *optional*, default to None): Whether to return calculated loss or not. Examples: ```python >>> from PIL import Image >>> import requests >>> from transformers import FlavaForPreTraining, AutoProcessor >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg" >>> image = Image.open(requests.get(url, stream=True).raw) >>> model = FlavaForPreTraining.from_pretrained("facebook/flava-full") >>> processor = AutoProcessor.from_pretrained("facebook/flava-full") >>> text = ["a photo of a cat"] >>> inputs = processor( ... images=[image], ... text=text, ... return_masks=True, ... return_codebook_pixels=True, ... padding=True, ... max_length=77, ... return_tensors="pt", ... ) >>> output = model(**inputs) ``` Nzð`input_ids_masked` isn't passed which means MLM loss won't be calculated correctlySetting it to `input_ids` so that model can work. Please pass it if this is unintentional. This is usually OKAY if you are doing inference on unmasked text...T) rÓr“rírÄrÂr»r¼rïr;r<) rÓr“rírÄr»r”rïr;r<zÊ`return_loss` is set to True but the image codebook is not initialized and no `mim_labels` have been passed. Reinstantiate the model with `init_codebook` set to True or pass in your custom `mim_labels`z`codebook_pixel_value` are required to generate `mim_labels` if loss is expected. Call `AutoProcessor` with `return_codebook_pixels` set to Truerrr{r|r)r?r@rArBrCrDc3ó(K—|] }||nd–—ŒywrOr<)r*rLs r-r.z.FlavaForPreTraining.forward..³sèø€Ò_À TÐ%5™T¸1Ó<Ñ_ùs‚c3ó&K—|] }|Œ|–—Œywr]r<)r*r·s r-r.z.FlavaForPreTraining.forward..Ôsèø€Ò8˜q¨a©iœÑ8ùr?rLrMrrr r!r"r#rNrOrPrQrRrSrTrUrVrWrXrYrZr<):rar…rWrOÚloggerÚwarningrTrr r"r<ÚRuntimeErrorr´rrIrQrKÚner~r@rrˆÚ cross_entropyrŠrFrHrArGrLrBr9ÚwhereÚanyÚnewrMrCrNrDrJr¯Ú normalizer®rcrZÚclamp_ÚLOGIT_SCALE_CLAMP_MINÚLOGIT_SCALE_CLAMP_MAXrEr>rGÚsumrFrr)r!r#r1rK)6r,rÓrRr“rSrírÄr”rÂr»rOrTrUrVrïr;r<rWÚflava_outputÚflava_masked_outputÚpos_maskrr rNrPrRÚ total_lossÚmim_lossÚmlm_lossÚ mmm_text_lossÚmmm_image_lossÚgc_lossÚitm_lossrTrUrZrYrVr8r9Úsequence_for_imageÚ masked_tokensÚmim_labels_filteredÚsequence_for_textÚmlm_labels_filteredÚ pos_pairsÚ end_indexÚtext_embeddingÚimage_embeddingÚ gc_labelsÚ gc_loss_imageÚgc_loss_textÚflava_lossesr s6 r-r zFlavaForPreTraining.forward€s… €ð~&1Ð%<‘kÀ$Ç+Á+×B]ÑB]ˆØ%0Ð%<‘kÀ$Ç+Á+×BYÑBYˆð0Ð;ñ -à×6Ñ6ð )ðÐ#¨ Ð(=ÜN‰Nð?ô ð )Ðà—z‘zØØ%Ø)Ø)Ø%Ø!5ð%EØ/Ø!5àð"ó ˆð #Ÿj™jØ&Ø%Ø)Ø)Ø!5Ø+Ø/Ø!5Øð)ó Ððˆà'×8Ñ8ÐØ&×6Ñ6ˆØ"5×"FÑ"FÐØ!4×!DÑ!DÐØ':×'PÑ'PÐ$àaeÐeˆ ÐeXÐe Ðe¨=Ðe¸>ÐeÈGÐV^ØGKÐKˆ ÐKZÐK /Ð4DØ:>Ð>ˆ Ð>Ð%¨ð#Ð.Ð2NÐ2ZØÐ!¡kØ×&Ñ&Ð.Ü&ð;óðð )Ð0Ü$ðYóðð"×0Ñ0×EÑEÐF[Ó\ ð?‰?˜QÓÐ#:Ñ#FÐKgÑKoØ!8ÐàÐ%Ø!×/Ñ/° Ó; Ø"&×"4Ñ"4°_Ó"EØ7;×7KÑ7K ˜?×-Ñ-¨dÓ3Ñ4à%7º¸J¿O¹OÈAÓ ÙÜ!Ÿ}™}×:Ñ:Ø"Ÿ™¨¨D×,AÑ,AÓBÐDW×D\ÑD\Ð]_ÓD`ó Hð §¡Ñ/‘Hà!Ÿ]™]Ð+=Ó> ð?‰?˜QÒÐ#9Ð#EÐJfÐJnØ 6ÐØÐ%Ø!×/Ñ/° Ó; Ø$5²a¸*¿/¹/È!Ó:LÐ9LÑ9NÒPQÐ6QÑ$RÐ!Ø *§ ¡ ¨d×.BÑ.BÓ C Ø&0°Ñ&?Ð#Ø$5°mÂQÐ6FÑ$GÐ!Ø!Ÿ]™]Ð+<Ó= ÙÜ!Ÿ}™}×:Ñ:Ø"Ÿ™¨¨D×,@Ñ,@ÓAÐCV×C[ÑC[Ð\^ÓC_ó Hð §¡Ñ/‘Hà!Ÿ]™]Ð+<Ó= ð?‰?˜QÒÐ#?Ð#KØŸ™Ð'CÓDˆJàÐ%Ø&ŸM™M¨!Ó, Ü Ÿ;™; y§}¡}£¸ À9Ç=Á=ÐRVÐQWÓCXÓYÙÜ!Ÿ}™}×:Ñ:¸:ÀzÓRHØ §¡Ñ/Hà/Ð;Ø3OÐPXÑ3YÐ0àÐ)Ø!+¨HÑ!5JàÐ)Ø!+¨HÑ!5JØ&5°hÑ&?Oð(Ñ3¸×8MÑ8MÐPQÓ8QØ!=ÐØ/×4Ñ4°QÓ7¸!Ñ;ˆIØ!3²A°q¸1¸y¹=Ð7HÊ!Ð4KÑ!LÐàÐ%Ø!×/Ñ/° Ó; Ø"&×"4Ñ"4°_Ó"EØ7;×7KÑ7K ˜?×-Ñ-¨dÓ3Ñ4à *§ ¡ ¨d×.BÑ.BÓ C Ø&0°Ñ&?Ð#Ø%7¸ ÂqÐ8HÑ%IÐ"Ø#'×#6Ñ#6Ð7IÓ#JÐ ÙÜ%'§]¡]×%@Ñ%@Ø(×-Ñ-¨b°$×2GÑ2GÓHÐJ]×JbÑJbÐceÓJfó&Nð# d×&;Ñ&;Ñ;‘Nà#'×#6Ñ#6Ð7IÓ#JÐ ð(Ð3¸×8LÑ8LÈqÒ8PØ <ÐØ 1²!Ð6L×6QÑ6QÐRSÓ6TÐ5TÑ5VÒXYÐ2YÑ ZÐàÐ%Ø!×/Ñ/° Ó; Ø *§ ¡ ¨d×.BÑ.BÓ C Ø&0°Ñ&?Ð#Ø$5°mÂQÐ6FÑ$GÐ!Ø"&×"4Ñ"4Ð5FÓ"GÙÜ$&§M¡M×$?Ñ$?Ø'×,Ñ,¨R°×1EÑ1EÓFÐH[×H`ÑH`ÐacÓHdó%Mð" T×%9Ñ%9Ñ9‘Mà"&×"4Ñ"4Ð5FÓ"GðÑ'¨OÑ,GÈD×LjÑLjÐmnÓLnØ!ŸZ™Z×7Ñ7¸ÊÈ1ÊaÈÑ8PÓQˆNÜŸ]™]×4Ñ4°^ÈÐ4ÓLˆNà"Ÿj™j×9Ñ9Ð:JÊ1ÈaÒQRÈ7Ñ:SÓTˆOÜ Ÿm™m×5Ñ5°oÈ2Ð5ÓNˆOàJ‰J×"Ñ"×'Ñ'×.Ñ.Ô/DÔF[Ô\à;?×;WÑ;WØ °·±×1GÑ1Gó<Ñ8Ð˜o¨yð Ð#Ø#3°HÑ#=Ð Ø"1°(Ñ";Ø% hÑ/ áÜ "§ ¡ × ;Ñ ;ÐQ×>]Ñ>]Ð>iÐ#×/Ñ/×8Ñ8Ô:ÐosØ,à&×8Ñ8ÐDð$×5Ñ5×>Ñ>Ô@àØØØØ Ø Ø Øð+ˆFñ. <×#8Ñ#8Ô#:àØ ððñôÑ8 FÔ8Ó8Ð8ä(ò Ùð á"ð ñ.ð ð&×2Ò2ð ñ ,ð ð%×0Ò0ð ð#/×"DÒ"Dð ð+×<Ò<ð ñ%<ð ð!4× @Ò @ð ñ$:ð ð 3×>Ò>ð ñ*Fð ð&9×%JÒ%Jð ñ"ð ñ "ð! ñ""ð# ñ$*:ð% ñ&)8ð' ñ(.ð) ñ*,ð+ ð r4r])NNNNNNNNNNNNNNTNN)r5r6r7Ú_tied_weights_keysrrrrrir9r£rQrrÅr:rIrr1rKr r¦r§s@r-r;r;Jsáø„òÐñ!˜{ð!¸HÀRÇYÁYÑðk ð #ð!k ð"˜d‘^ð#k ð$˜d‘^ð%k ð& ˆuU—\‘\Ñ"Ð$=Ð=Ñ >ò'k óôk r4r;)r;rórirbr`rSr‘)Hr8rrórÚdataclassesrÚtypingrrrr9Útorch.utils.checkpointrÚactivationsr Úmodeling_layersrÚmodeling_outputsrr Úmodeling_utilsrrrÚutilsrrrrÚconfiguration_flavarrrrrÚ get_loggerr5r[r rerfrrr>rKrr`ror»rÝrr rr%r)r4rJrSrir‘r`rbrÇrßrèrórr_r!r&r;Ú__all__r<r4r-úr‹s2ðñãÛÝ#Ý!ß'Ñ'ãÛÝå!Ý9ßKßcÑcßDÓD÷õð ˆ× Ñ ˜HÓ %€à>ÐàÐØÐà˜_Ð.>Ð@UÐUÑVÐðÙðôô {ó óóð ð<Ùðôô +óóóððDÙðôôWt óWtóóðWtôx_˜2Ÿ9™9ô_ôH!b—i‘iô!ôH6˜"Ÿ)™)ô6ôrF˜Ÿ™ôFôRb—i‘iôô$'R—Y‘Yô'ôT˜Ÿ ™ ôô" "—)‘)ô ô +Ð+ô+ô\' 2—9‘9ô' ôT"—)‘)ôðôN˜?óNóðNðDô] Ð*ó] óð] ð@ôm Ð)óm óðm ð`ô\ Ð/ó\ óð\ ð~ôh Ð%óh óðh ôV § ¡ ôô*C˜bŸi™iôCô" 2§9¡9ôñ(ðôôr)Ð-ór)óðr)ôj 2§9¡9ôô" § ¡ ôô, 2—9‘9ô ô%9 §¡ô%9ñPðôô ] Ð.ó] óð ] ò@ r4