We can organize an interview with Aldin or any of our 25,000 available candidates within 48 hours. How would you like to proceed?
Schedule Interview NowMy name is Gregory L. and I have over 8 years of experience in the tech industry. I specialize in the following technologies: Apache Spark, Python, SQL, ETL Pipeline, Amazon S3, etc.. I hold a degree in , Bachelor's degree, . Some of the notable projects I’ve worked on include: DSPy Chat App, Python Library: pygethub, Python Library: aws-json-dataset, Serverless Streaming Reddit Pipeline & Data Lake, bioNX - Knowledge Graph. I am based in Long Beach, United States. I've successfully completed 5 projects while developing at Softaims.
I am a dedicated innovator who constantly explores and integrates emerging technologies to give projects a competitive edge. I possess a forward-thinking mindset, always evaluating new tools and methodologies to optimize development workflows and enhance application capabilities. Staying ahead of the curve is my default setting.
At Softaims, I apply this innovative spirit to solve legacy system challenges and build greenfield solutions that define new industry standards. My commitment is to deliver cutting-edge solutions that are both reliable and groundbreaking.
My professional drive is fueled by a desire to automate, optimize, and create highly efficient processes. I thrive in dynamic environments where my ability to quickly master and deploy new skills directly impacts project delivery and client satisfaction.
Main technologies
8 years
6 Years
2 Years
3 Years
Potentially possible
Amazon
This project demonstrates a serverless, infinitely scalable solution to monitor the social news website Reddit in real-time. The architecture employs a fan-out pattern with SQS and Lambda to collect data from any number of subreddits, stream the JSON data to an S3 data lake with Kinesis Firehose, catalog and convert it to Parquet and make it available for querying and analytics. The basic architecture pattern can be applied to any application requiring parallel data streaming from a single platform, from webscraping to IoT. The data lake is built on top of S3 and employs AWS Glue Crawlers, Triggers, a Glue Job and a scheduled Workflow to orchestrate cataloging and ETL conversion from JSON to partitioned Parquet files. Parquet is desirable because it saves space and is faster to query than raw JSON with tools like Athena. With minimal tuning it could easily scale to monitor the entire Reddit website, depending on cost constraints.
BioNX is an automated knowledge graph solution for PPI networks built on Python and Neo4j. I was inspired to build this when I learned that one of the biggest bottlenecks in biotech and medicine is integrating the massive amounts of biological data being produced. Knowledge graphs are able to provide context for information better than tabular data, which helps with this integration. The application uses Python scripts to collect data from several public databases and populates a knowledge graph for a protein interaction given as input. A researcher working on a particular protein or group of proteins should have immediate access to all relevant information (as connected nodes) such as cellular and physiological context, disease conditions, literature etc., for their particular protein interaction of interest.
in
2004-01-01-2005-01-01
Bachelor's degree in
2002-01-01-2008-01-01
in
2005-01-01-2005-01-01