The Challenge
Driven by a deep passion for science, our client endeavoured to build a proof of concept, “RefStudio”, with generative AI to streamline the process of writing research papers, catering to more than just the scientific community.
The Approach
Functioning as an open-source text editor akin to an Integrated “Development” Environment (IDE), RefStudio incorporates Large Language Models (LLMs) to offer the following features:
- Manage references by uploading papers to the app
- Utilise a chat interface for questions about the content of uploaded reference papers
- Enable autocomplete and content suggestions while writing
- Streamline peer collaboration in document editing using a Conflict-Free Replicated Data Type (CDRTs) protocol within a Git-like interface, making it easy to track changes (viewing additions or removals)
Unify the text editor for desktop and web deployment
One of the client's goals was to build a desktop application for RefStudio through Tauri because it is optimised for performance, it follows modern web development principles, it boasts an active community, and it’s well-documented. We implemented a Python backend, through a sidecar pattern natively supported by Tauri. This backend utilises grobid to parse PDFs and send the references back to the frontend. The accessibility of this function is integrated into the frontend.
As the project progressed, the client also wanted RefStudio to work on the web. Our team faced a few difficulties since we couldn’t use Tauri’s constructs but we overcame this by updating the Python backend to also be an HTTP server, and by implementing a unified codebase for client/server communication. We also implemented additional APIs for filesystem interactions to take full advantage of a unified codebase for listing, reading, and writing files within the application.
Assist and chat
The context window in the chat feature is key for a seamless user experience. When posing a question to the GenAI model, providing context is crucial to ensure an accurate output response. In RefStudio, the model accesses the references (PDF documents) the user has uploaded to the system to respond effectively.
The length of the prompt directly impacts the cost, and how the AI sifts through information — the bigger the prompt, the more expensive it is. A longer question on the chat prompts the model to select particular parts from the user's uploaded references, exploring the details of the user's question more thoroughly. The process entails the GenAI model to meticulously select excerpts across the entirety of the uploaded references.
Instead of using the traditional Retrieval-augmented generation (RAG) method, which involves transforming text into vectors and storing them in a database, we opted for a unique approach. This approach still follows the RAG principle but offers the following advantages:
- Our approach facilitated fast progress in creating the PoC as it eliminated the need for additional infrastructure complexity like integrating a vector database
- A more cost-effective approach to vectors since we didn’t have to host a database and call the AI model to embed text
- The process was also faster with no API calls, meaning everything was done locally by the BM25Ranker algorithm
Build an IDE-like application
The client wanted a word processor akin to an Integrated “Development” Environment (IDE). The “word processor” we built, akin to an IDE, allows engineers to streamline code development because it combines functions like software editing, building, and testing within a user-friendly interface that enhances developer productivity.
We implemented UI elements for the left and right collapsible sidebars, a footer showing information about the project, and a command palette for quick executions. These elements are common constructs in IDEs which has allowed us to adapt the app to the web environment.
The program’s main editor area consists of two editor panes that provide the convenience of tabbed browsing for different types of content. In addition, we implemented comprehensive support for displaying multiple types of content in the main editor area, ensuring a versatile and efficient editing experience. The editor’s main features include:
- Files that can be edited using a clean and streamlined editor
- PDF files that the user uploads or gets via Semantic Scholar search
- Reference details that show metadata extracted from an uploaded PDF or retrieved via Semantic Scholar
- Reference table for managing the project references, including individual and bulk edits of reference metadata
- Text file viewer to display text content like markdown and bib files
Implement a clean and streamlined editor using Tiptap
After careful consideration of various text editor libraries, we adopted Tiptap and ProseMirror. Tiptap enabled our team to easily create React components for RefStudio, while the logic is coded through ProseMirror.
The client wanted us to build an editor that can block browsing, indenting blocks, blocks drag-and-drop, collapsible sections, etc., but TipTap lacked the extensions for these particular features. As a solution, we developed our own extensions by studying ProseMirror's API.
We also wanted to integrate sentence completion directly into RefStudio to provide a seamless user experience. We achieved this by implementing a new custom extension to display suggestions and enable the user to cycle between them without leaving the editor area. The extension also handles the asynchronicity of the API call and displays an error message in case something goes wrong.
Support local and remote LLM providers
To provide a top-quality feedback experience, RefStudio supports both non-stream and streaming modes.
In non-stream mode, users can input their text and the model responds with suggestions or completions after processing the entire input. In streaming mode, the model provides real-time feedback instead of waiting for the entire response to be returned, allowing for a more interactive chat experience. The current version uses non-streaming for rewrite and text completion, and streaming for the chat interaction.
In addition to integrating with OpenAI, RefStudio also offers support for local AI models. Ollama, a local language model provider, can be configured via the application settings, providing users with the option to use AI capabilities even without an internet connection.
Product Design
For us, it was critical to transform the engineering-heavy prototype which had some obvious usability issues, into a visually appealing and functional tool, that would meet agreed requirements for accessibility, and have solid UI/UX considerations. We also worked on a logo redesign that contributed to a more well-rounded and mature brand impression for RefStudio.
Accessibility
We agreed with RefStudio that the tool should at a minimum achieve WCAG 2.1 AA standards, focusing on specific aspects like contrast, readable text, different font sizes, and colours. We also implemented dark mode, not only because it adheres to modern UX best practices but also because it can successfully cut glare and reduce some blue light, both of which increase visual comfort.
Design tokens
Design tokens were key to a unified theming system that covered more than just colour, including typography and spacing. This flexibility made it easy to create themes which included the user-friendly dark mode, with a simple click.
Product Design and Engineering collaboration
One of our primary objectives was to achieve a harmonious balance between functionality and a sleek, distraction-free interface. We focused on creating a highly usable component library that seamlessly integrated with RefStudio's core functionalities, resulting in a clean and intuitive design.
We prioritised familiarity and well-known design patterns and used established UI conventions as we recognised the importance of maintaining an intuitive design, not only for ease of use but also for improved user adoption.
RefStudio logo
The redesign of the logo was another noteworthy aspect of our team's work. Moving from a non-scalable, non-vector-based logo, we revamped it to ensure scalability and consistency. Subtle changes in font and colour, including specific shades of black and white, were used which aligned the logo with the refined visual identity of RefStudio.
The Deliverables
Closing the Engagement
We successfully implemented all MVP features within the designated time, presenting a robust proof of concept that lays a strong foundation for future iterations. YLD's proactive approach enabled teams to build a user-friendly and easily maintainable open-source writing environment.
The editor in RefStudio employed Tiptap and ProseMirror which reflects our dedication to delivering a comprehensive solution. Custom extensions address specific limitations that demonstrate our proficiency in innovating and tailoring solutions to unique requirements. Support for both non-stream and streaming modes, alongside the integration of local AI models like Ollama, highlights our unwavering commitment to providing a top-tier user experience, ensuring accessibility even in the absence of an internet connection.
YLD’s impact on this project extends beyond mere milestones, demonstrating our commitment to adaptability, innovation, and dedication to surpassing expectations. RefStudio stands as proof to shape the future of scientific writing through the integration of GenAI capabilities.