Job Description
About the role Seeking a Junior Data Engineer to support the Department of Transportation. The focus of this role is supporting the modernization of legacy Informatica-based ETL pipelines into Databricks using PySpark and Spark SQL. This position supports data migration and modernization efforts within a data-heavy, potentially regulated environment. Candidates will work closely with senior engineers, data architects, and QA teams during iterative migration cycles.
Job Responsibilities - Analyze Informatica workflows and mappings to understand source-to-target logic, transformations, dependencies, and scheduling order.
- Convert Informatica mappings into Databricks pipelines using PySpark and Spark SQL.
- Implement data ingestion from on-prem and cloud sources into Databricks using the medallion architecture (landing bronze silver).
- Adapt existing ETL logic to align with a new enterprise data model, identifying gaps and required transformation changes.
- Support unit testing, reconciliation, and data validation between legacy and modern pipelines.
- Validate row counts, aggregates, and business rules to ensure data accuracy and consistency.
- Document migration logic, assumptions, and deviations from legacy behavior.
- Collaborate with senior engineers, data architects, and QA teams throughout iterative migration cycles.
Required skills Data engineering ETL development Data integration Informatica PowerCenter Common transformations Databricks SQL ETL / ELT Public Trust Data Modeling
Preferred skills AWS Delta Lake
Education requirements Degree
Bachelor
Major
Computer Science
Job Requirements - 2-3 years of experience in data engineering, ETL development, or data integration.
- Working knowledge of Informatica PowerCenter, including: Mappings, workflows, and sessions. Common transformations (Source Qualifier, Expression, Lookup, Joiner, Aggregator, Router, Filter).
- Basic to intermediate experience with Databricks, including: PySpark, Spark SQL, Notebooks and jobs.
- Strong SQL fundamentals, including joins, aggregations, and window functions.
- Solid understanding of ETL / ELT concepts, data warehousing principles, and batch processing.
- Strong attention to detail and analytical skills.
- Ability to clearly document technical logic and communicate findings to technical team members.
- Data Modeling and Analysis Skills:
- Interpret legacy data models and mapping documentation.
- Identify how legacy fields map (or do not map) to a new target data model.
- Flag missing logic, derived fields, or transformation gaps early in the migration process.
- Perform detailed data validation, including reconciliation of row counts and aggregates.
- Prior experience in regulated or data-heavy environments (finance, government, healthcare) is preferred.
- Exposure to Informatica-to-Databricks migrations or similar data modernization efforts is preferred.
- Familiarity with Delta Lake and medallion architecture (Bronze / Silver / Gold) is preferred.
- Basic understanding of AWS, including S3 and IAM concepts.
- Experience reading or generating code from Informatica XML exports.
- Bachelor's Degree in Computer Science, Engineering, or a related field (or equivalent experience).
- Experience supporting report rationalization and data migration initiatives.
- Clearance: US Citizen eligible to obtain DOT Public Trust.
Job Tags