AWS Data Engineer

Capgemini
4dHybrid

About The Position

Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues around the world, and where you’ll be able to reimagine what’s possible. Join us and help the world’s leading organizations unlock the value of technology and build a more sustainable, more inclusive world.Location- Portsmouth, NH- required hybrid Summary-Design build and operate serverless APIs and even tdriven services on AWS API Gateway Lambda Step Functions Event Bridge Integrate document ingestion OCRML extraction and generative AI pipelines normalize extracted data to a common schemaBuild tooling and UIs for human in the loop validation React TypeScript frontends backend APIsEnsure security observability scalability and cost efficiency across services IAM VPCs monitoring tracing CICDJob Description -

Requirements

  • Handson experience with AWS Lambda Nodejs or Python runtime including cold start optimization memory tuning and concurrency limits
  • API Gateway REST and or HTTP APIs routing authorization request response transformations throttling and caching
  • Event driven services Event Bridge SNS and SQS for asynchronous pipelines retry semantics DLQs and at least once exactly once considerations
  • Orchestration AWS Step Functions Standard and Express for coordinating multistep workflows and long running jobs
  • Storage Amazon S3 for object storage lifecycle policies versioning presigned URLs and understanding of S3 performance and consistency model
  • Databases DynamoDB single table design GSIs transactions and or RDS Postgres MySQL depending on relational needs familiarity with choosing the right persistence model
  • IAM security Least privilege IAM policies resource based policies KMS for encryption and secure access patternsGenerative AI ML and document extraction
  • Experience integrating large language models LLMs and generative AI APIs eg Open AI Anthropic AWS Bedrock or selfhosted models and designing safe cost-efficient prompt strategies batching and cachingBasic familiarity with document OCR and extraction tools IE AWS Textract Tesseract and third party OCRIDP Intelligent Document Processing platforms
  • Knowledge of structured information extraction approaches prompt engineering for LLMs retrieval augmented generation RAG text embeddings and vector similarity search
  • Data normalization schema mapping designing canonical schemas entity extraction schema matching and validation and transformation pipelinesAPI design and eventing
  • RESTful andor GraphQL API design best practices versioning pagination error handling idempotency and specdriven development
  • Open API Async API
  • Idempotent operations concurrency control optimistic locking and sequence guarantees where required
  • Event modeling designing domain events event contracts event versioning and event driven architectures with durable delivery Event Bridge SNSSQS KafkaFrontend and human in the loop tools
  • Frontend frameworks React preferred TypeScript component libraries and state management for building validation UIs and dashboards
  • UX for human in the loop designing review approve reject workflows diffing extracted vs original content confidence indicators and bulk validation tools
  • Micro Frontend Implement micro frontends to compose UX components as independently deployable versioned modules
  • Analytics visualization integrating charts graphs for accuracy metrics error rates model drift indicators leveraging tools like Power BI or TableauObservability reliability and security
  • Monitoring tracing CloudWatch metrics logs XRay or Open Telemetry tracing structured logging and building dashboards alerts for SLOs and error budgets
  • Testing CICD automated testing unit integration contract tests deployment pipelines GitHub Actions Cloud Formation CDK canary blue green deployments
  • Security best practices encryption at rest in transit secrets management input validation rate limiting and OWASP considerations
  • Cost optimization measuring and optimizing for Lambda compute costs S3 lifecycle DynamoDB capacity modes and API Gateway charges
  • Engineering practices soft skills technical oriented
  • Infrastructure as code AWS CloudFormation or CDK for reproducible deployments.

Responsibilities

  • Design build and operate serverless APIs and even tdriven services on AWS API Gateway Lambda Step Functions Event Bridge
  • Integrate document ingestion OCRML extraction and generative AI pipelines normalize extracted data to a common schema
  • Build tooling and UIs for human in the loop validation React TypeScript frontends backend APIs
  • Ensure security observability scalability and cost efficiency across services IAM VPCs monitoring tracing CICD

Benefits

  • Paid time off based on employee grade (A-F), defined by policy: Vacation: 12-25 days, depending on grade, Company paid holidays, Personal Days, Sick Leave
  • Medical, dental, and vision coverage (or provincial healthcare coordination in Canada)
  • Retirement savings plans (e.g., 401(k) in the U.S., RRSP in Canada)
  • Life and disability insurance
  • Employee assistance programs
  • Other benefits as provided by local policy and eligibility
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service