Key Responsibilities:

Develop, maintain, and optimize data pipelines using Python and PySpark for large-scale data processing.
Good knowledge on scalable data solutions on AWS cloud infrastructure, leveraging services such as S3, Redshift, Glue, and Lambda.
Collaborate with cross-functional teams to design and implement robust dataarchitectures for ETL and data transformation processes.
Ensure high-quality code through unit testing, code reviews, and adherence to best practices in data engineering.
Perform performance tuning and troubleshooting of PySpark jobs and data pipelines.
Conduct data analysis and generate meaningful insights to support business decision-making.
Participate in PySpark coding tests as part of the evaluation process.
Key Skills & Qualifications:
3+ years of hands-on experience with Python and PySpark.
Strong experience with AWS services for data storage, processing, and analytics.
Expertise in building and managing ETL pipelines and data workflows.
Familiarity with CI/CD pipelines for data engineering projects.
Strong problem-solving skills and ability to optimize data processes for performance and efficiency.
Good communication skills and ability to collaborate with teams.

Preferred Qualifications:

Note: A coding test on PySpark will be part of the selection process.

Share this blog:

Data Analyst @ Remote – 2 Roles

Apply for this position