Key Responsibilities:
- Develop, maintain, and optimize data pipelines using Python and PySpark for large-scale data processing.
- Good knowledge on scalable data solutions on AWS cloud infrastructure, leveraging services such as S3, Redshift, Glue, and Lambda.
- Collaborate with cross-functional teams to design and implement robust dataarchitectures for ETL and data transformation processes.
- Ensure high-quality code through unit testing, code reviews, and adherence to best practices in data engineering.
- Perform performance tuning and troubleshooting of PySpark jobs and data pipelines.
- Conduct data analysis and generate meaningful insights to support business decision-making.
- Participate in PySpark coding tests as part of the evaluation process.
- Key Skills & Qualifications:
- 3+ years of hands-on experience with Python and PySpark.
- Strong experience with AWS services for data storage, processing, and analytics.
- Expertise in building and managing ETL pipelines and data workflows.
- Familiarity with CI/CD pipelines for data engineering projects.
- Strong problem-solving skills and ability to optimize data processes for performance and efficiency.
- Good communication skills and ability to collaborate with teams.
Preferred Qualifications:
- Experience with data modeling and database design.
- Familiarity with other big data tools like Hadoop or Kafka is a plus.
- Knowledge of GCP or Azure cloud platforms.
Note: A coding test on PySpark will be part of the selection process.