Skip to Content

LLM & RAGS intern

1 open position

Role Overview

We are seeking a highly motivated AI/ML Intern with a focus on Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG).  You will be working at the frontier of generative AI, helping us explore, build and optimize systems that bridge the gap between static models and dynamic, private data.

This role is ideal for a student looking to apply theoretical knowledge to a production-grade environment and build a significant portfolio piece.

Key Responsibilities

Architect & Optimize RAG Pipelines: Improve retrieval accuracy using vector databases (e.g., Pinecone, Milvus, or Weaviate).

LLM Implementation: Experiment with and fine-tune prompts and workflows using models like GPT-4, Claude, or Llama 3.

Data Engineering: Assist in the cleaning, chunking, and embedding of proprietary datasets.

Evaluation: Implement benchmarking frameworks to measure hallucinations, faithfulness, and relevancy of model outputs.

Qualifications

Academic Standing:  Graduate student (Masters/PhD) in Computer Science, Data Science, or a related technical field.

GPA: 3.5 or higher preferred.

Technical Skills: Proficiency in Python and experience with AI frameworks (e.g., LangChain, LlamaIndex, PyTorch, or TensorFlow).

Domain Knowledge: A solid understanding of transformer architectures and the mechanics of RAG.

Soft Skills: A research-oriented mindset with the ability to troubleshoot complex, non-deterministic systems.

What You Will Gain

Mentorship: Direct access to several AI chief architects and weekly 1-on-1 growth sessions.

Portfolio Impact: Significant contribution to a live AI project that you can showcase to future employers.

Flexibility: We respect your academic schedule and offer flexible working hours.

Future Opportunities: Top performers will be prioritized for future full-time, paid openings. 

Novi Sad, Serbia
Intern

Principal Data Architect

1 open position

Role Overview

As a Principal Data Architect, you will be the primary visionary for our global data strategy. You will tackle the "unsolved" problems of autonomous vehicle data: how to efficiently store, index, and query petabytes of high-dimensional, multi-modal sensor data.

You will lead the transition of our data infrastructure into a state-of-the-art Open Lakehouse architecture, leveraging Apache Iceberg and the Hadoop ecosystem to create a deterministic, high-performance environment for ML research and safety-critical validation.

This role would require you to work for two years in our Serbian office, with the option of then moving to the US office.

Core Responsibilities

  • Architectural Innovation: Lead the R&D and design of a next-generation data lakehouse that supports the unique requirements of ADAS/AV, including 4D spatial-temporal querying and multi-modal data fusion.

  • Deep Optimization: Go beyond standard implementations of Apache Iceberg to develop custom partitioning schemes, Z-ordering, and hidden indexing strategies tailored for LiDAR, radar, and video metadata.

  • Theoretical Leadership: Apply advanced research in distributed systems to solve challenges regarding data consistency, deterministic "replay" of vehicle logs, and massive-scale data lineage.

  • Strategic Storage R&D: Develop novel algorithms for data deduplication and "intelligent tiering," ensuring that rare "edge-case" driving data is preserved while optimizing the cost-to-performance ratio of the petabyte-scale lake.

  • Cross-Functional Research: Partner with ML Research and Simulation teams to ensure the data architecture supports emerging paradigms like Foundation Models and End-to-End Autonomous Driving architectures.

  • Technical Mentorship: Act as a high-level consultant and mentor to the broader Data Engineering organization, fostering an environment of analytical rigor and engineering excellence.

Required Qualifications

  • Education: PhD in Computer Science, Distributed Systems, Database Systems, or a related quantitative field.

  • Specialized Experience: 5+ years of experience in data systems, with a significant track record of designing large-scale distributed architectures.

  • Iceberg & Hadoop Internals: Deep, "under-the-hood" knowledge of Apache Iceberg (specification and implementation) and the Hadoop ecosystem (HDFS, Spark, Trino/Presto).

  • Research & Publication: Evidence of contributions to the field, such as publications in top-tier conferences (e.g., SIGMOD, VLDB, ICDE, OSDI) or a history of significant contributions to major open-source data projects.

  • Computational Foundations: Expert-level understanding of query optimization, file format internals (Parquet/Avro), and the trade-offs of distributed consensus protocols.

Preferred Skills & "Edge" Expertise

  • Automotive Safety Standards: Understanding of data integrity requirements for ISO 26262 or SOTIF (Safety of the Intended Functionality).

  • Geospatial Mastery: Experience with H3, S2, or other spatial indexing systems for high-frequency GPS and trajectory data.

  • Cloud Economics: Proven ability to manage the financial architecture of massive cloud deployments (AWS/Azure/GCP).

How this role impacts our mission

In the AV world, the company with the best data loop wins. This role is not just about moving data; it’s about creating the mathematical and structural framework that allows our engineers to find the "needle in the haystack"—the specific sensor frame that will help us solve the next great autonomous driving challenge.

Novi Sad, Serbia
Engineering
Full-Time

Our Mission:  Revolutionize ADAS/AV development by automating manual workflows and distilling complex road and bench‑test data into actionable insights for safer, faster programs.