Tìm việc làm

Big data: implement Blob garbage collection with Apache Spark (Java Backend Intern)

CÔNG TY TNHH LINAGORA

Hạn nộp: 31/01/2026

Chi tiết thông tin tuyển dụng "Big data: implement Blob garbage collection with Apache Spark (Java Backend Intern)"

Mức lương

Có trợ cấp thực tập

Địa điểm

  • Tầng 11, Toà nhà Hồng Hà Center, số 25 Lý Thường Kiệt, Phường Cửa Nam, Thành phố Hà Nội, Việt Nam

OVERVIEW

Have you ever wanted to contribute to real open-source infrastructure software used at scale?

Are you curious about distributed systems, big data processing, and open standards?

Twake Mail ([protected info]) is part of a new generation of mail applications, that ensures superior security and performance by using the JMAP protocol ([protected info]). It integrates perfectly with Linagora's collaborative suite, The Twake.AI [protected info] (Contacts, Calendar, Chat and Drive). Its server is based on the Apache James project ([protected info]) to which the team contributes. The server is both scalable by design and easy to customize. Twake Mail client is multi-platform, can be used on any device via mobile apps (android, ios) or a web interface.

WHAT WILL YOU DO?

During this internship, you will work on real production challenges, contribute to open-source repositories, and collaborate with experienced engineers through code reviews, design discussions, and documentation.

Project: Big data - implement Blob garbage collection with Apache Spark

Apache James is a heavily distributed server relying on NOSQL data. For performance it relies on a highly denormalized data model.

Historically James allows running "tasks" locally to correct those denormalization. However the current implementation is subject to limitations: time to complete is bounded, being run locally the tasks are not scalable, and competes with live traffic.

This project aims at providing a skeleton for Apache James tasks, being run outside of the Apache James server. We would provide both a CLI toolbox and Apache Spark scripts for running those denormalization checks. This would take the form of a public repository on gitHub that could then be, if relevant, contributed upstream.

We propose to address the following tasks as part of this proof of concept:

- Message denormalisation checks: `messageidtable content needs to be reconciliated with content of `imapuidtable`, without disrupting live traffic.

- Mailbox counter checks: Iterate `imapUidTable` and update the `mailboxcounters` table (count of email, and unseen) accordingly.

- Blobs deduplication: the storage identifier of email content is a hash of the content. This means that 2 blob with the same content will be stored once.

But this also means deletion is a hard topic. To that end we implemented a cleanup task based on bloom filters to delete no longer referenced blobs.

Skills: Spark (Java), Cassandra, S3, Bloom filters

OUR REQUIREMENTS:

- Curious, and willing to learn, autonomous

- Speaking English

- Having a basic Java knowledge

- Knowledge of Flutter / dart is appreciated

- Experience with Docker, Spark, or distributed systems is a plus

- Experience with Flutter UI development is a plus

- Having a basic usage of Git

- Passionate about OpenSource and Free Software

- Knowledge of Flutter is a plus

WHY YOU'LL LOVE WORKING WITH US?

- Work in an international team with high skilled people

- Gain hands-on experience with real product development and testing for a large-scale OpenSource project.

- Opportunity to be an official member of LINAGORA after the internship period

- Flexible and balanced work-life environment.

- Participate in OpenSource communities and conferences in Vietnam

- Modern working space, free coffee

Thông tin công ty

Giới thiệu

CÔNG TY TNHH LINAGORA

LINAGORA là tập đoàn tiên phong của Pháp hoạt động trong lĩnh vực Phần mềm Nguồn mở và hiện tại công ty chúng tôi đang dẫn đầu tại thị trường Pháp về giải pháp nguồn mở. Với hơn 200 chuyên gia và có trụ sở tại Pháp, Canada, Tunisie và Việt Nam. LINAGORA vừa là công ty sản xuất các phần mềm mã nguồn...

Quy mô

Từ 10 - 25 nhân viên Nhân viên

Địa chỉ

Tầng 11, Toà nhà Hồng Hà Center, số 25 Lý Thường Kiệt, Phường Cửa Nam, Thành phố Hà Nội, Việt Nam

Việc làm tương tự từ JobOKO

Xem thêm
× Modal Image