Task Description
? Research, design, and implement computer vision and vision-language model (VLM) use cases tailored for MBOS.
? Train and fine-tune deep learning models focusing on multimodal fusion and efficient deployment on embedded platforms.
? Work closely with software, hardware, and product teams to integrate developed algorithms into the overall vehicle system.
? Build and maintain toolchains for fine-tuning and deploying LLMs/VLMs, manage training clusters, and ensure efficient inference on both server-side and embedded targets.
? Experimentation, Evaluation, and Knowledge Transfer to other team members.
Qualifications
? Master degree or above in Computer Science, Electrical Engineering, Robotics, or a related field.
? Proven hands-on experience in developing and deploying computer vision and/or VLM algorithms, preferably in the automotive or robotics domain.
? Experience with deep learning frameworks (such as PyTorch, TensorFlow) and classical computer vision libraries
? Experience with fine-tuning and optimizing LLMs/VLMs (e.g., LoRA, RAG, prompt engineering)
? Familiarity with multimodal fusion techniques and the architecture of models like Transformer, BERT or similar.
? Solid grounding in both Natural Language Processing and Computer Vision; able to design and implement solutions that leverage both modalities.
? Experience with model compression, quantization, and deployment on resource-constrained environments
? Familiarity with dataset collection, labeling, and evaluation for multimodal tasks
? Strong programming skills in Python and C++.
? Experience with cloud services (e.g., Azure, AWS, Tencent) is a plus.
? Outstanding analytical and problem-solving skills.
? Technical leadership and ability to make decisions based on technical facts.
? Strong sense of ownership and drive.
? Good communication skills and ability to work in a collaborative, cross-functional environment.
? English proficiency in written and spoken form.