This is a experimental project. Feel free to send feedback!
Thesis Tide ranks papers based on their relevance to the fields, with the goal of making it easier to find the most relevant papers. It uses AI to analyze the content of papers and rank them!
Learning to understand dynamic 3D scenes from imagery is crucial for applications ranging from robotics to scene reconstruction. Yet, unlike other problems where large-scale supervised training has en...
Useful Fields:
This study seeks to automate camera movement control for filming existing subjects into attractive videos, contrasting with the creation of non-existent content by directly generating the pixels. We s...
Useful Fields:
Existing text-to-image (T2I) diffusion models face several limitations, including large model sizes, slow runtime, and low-quality generation on mobile devices. This paper aims to address all of these...
Useful Fields:
Significant achievements in personalization of diffusion models have been witnessed. Conventional tuning-free methods mostly encode multiple reference images by averaging their image embeddings as the...
Useful Fields:
Tactile sensing is crucial for robots aiming to achieve human-level dexterity. Among tactile-dependent skills, tactile-based object tracking serves as the cornerstone for many tasks, including manipul...
Useful Fields:
Vision-Language Models (VLMs) have shown promising capabilities in handling various multimodal tasks, yet they struggle in long-context scenarios, particularly in tasks involving videos, high-resoluti...
Useful Fields:
We measure the out-of-plane shear modulus of few-layer graphene (FLG) by a blister test. During the test, we employed a monolayer molybdenum disulfide (MoS2) membrane stacked onto FLG wells to facilit...
Useful Fields:
We introduce a novel approach to enhance the capabilities of text-to-image models by incorporating a graph-based RAG. Our system dynamically retrieves detailed character information and relational dat...
Useful Fields:
Large Vision-Language Models (VLMs) have been extended to understand both images and videos. Visual token compression is leveraged to reduce the considerable token length of visual inputs. To meet the...
Useful Fields:
Rectified flow models have emerged as a dominant approach in image generation, showcasing impressive capabilities in high-quality image synthesis. However, despite their effectiveness in visual genera...
Useful Fields:
In this work, we develop machine learning techniques to study nonperturbative scattering amplitudes. We focus on the two-to-two scattering amplitude of identical scalar particles, setting the double d...
Useful Fields:
We formulate a field theoretic description for -dimensional interacting nodal semimetals, featuring dispersion that scales with the linear (th) power of momentum along $d_L...
Useful Fields:
This paper aims to address the challenge of reconstructing long volumetric videos from multi-view RGB videos. Recent dynamic view synthesis methods leverage powerful 4D representations, like feature g...
Useful Fields:
Image tokenizers map images to sequences of discrete tokens, and are a crucial component of autoregressive transformer-based image generation. The tokens are typically associated with spatial location...
Useful Fields:
Given that visual foundation models (VFMs) are trained on extensive datasets but often limited to 2D images, a natural question arises: how well do they understand the 3D world? With the differences i...
Useful Fields:
Graphical User Interface (GUI) agents hold great potential for automating complex tasks across diverse digital environments, from web applications to desktop software. However, the development of such...
Useful Fields:
The remarkable success of Large Language Models (LLMs) has extended to the multimodal domain, achieving outstanding performance in image understanding and generation. Recent efforts to develop unified...
Useful Fields:
Multimodal Large Language Models (MLLMs) have achieved impressive results on various vision tasks, leveraging recent advancements in large language models. However, a critical question remains unaddre...
Useful Fields:
Video temporal grounding aims to localize relevant temporal boundaries in a video given a textual prompt. Recent work has focused on enabling Video LLMs to perform video temporal grounding via next-to...
Useful Fields:
Video generation models (VGMs) have received extensive attention recently and serve as promising candidates for general-purpose large vision models. While they can only generate short videos each time...
Useful Fields: