Detailed Use Case for Database Performance Improvement: Optimizing CSV Data Load from Amazon S3

Problem Statement

The daily load process for customer data was inefficient, taking approximately 7–8 hours to load a 9GB CSV file containing around 30 million records into a SQL Server database. The bottleneck arose from:

1. Using KingswaySoft, a third-party SSIS integration tool, for both data transfer from an Amazon S3 bucket and hashing sensitive customer information (email addresses and mobile numbers) column by column.

2. The record-by-record hashing operation, performed within KingswaySoft, added significant processing time.

This delay impacted downstream processes and hindered timely access to the latest customer data.

Proposed Solution

After analyzing the issue, the following optimized approach was designed and implemented:

1. Data Transfer Optimization:

Continued using KingswaySoft solely for its reliable Amazon S3 bucket integration, to transfer the CSV file to the local environment.

2. Hashing Process Optimization:

Moved the hashing operation from KingswaySoft to the SQL Server side using SQL’s in-built MD5 hashing functions.
Imported raw data into a staging table in SQL Server without hashing.
Applied a batch update query to hash the required columns (email and mobile) directly in the SQL Server database after data ingestion.

3. Parallelism and Indexing:

Leveraged bulk insert operations and optimized SQL Server configurations to handle the 30 million records efficiently.
Applied appropriate indexes post-load to speed up updates and queries.

4. Result:

Reduced the total data load time from 7–8 hours to 45 minutes.
Maintained the integrity and security of sensitive customer data through hashing.

Detailed Execution Steps

Maxnet tackled Client’s data challenges by employing a structured, well-defined strategy involving data centralization, integration, and advanced business intelligence tools.

Amazon S3 to Local Storage

Used KingswaySoft SSIS connectors to pull the 9GB CSV file from the Amazon S3 bucket to local storage.
Ensured error handling and retries to account for large file size and potential network issues.

Raw Data Import

Loaded the CSV data into a staging table in SQL Server using bulk insert operations to minimize transfer time.
This avoided any transformations or hashing at this stage.

Batch Hashing

Used SQL Server’s HASHBYTES(‘MD5’, column_name) function to hash sensitive data in batches.
This approach reduced processing overhead by allowing SQL Server to perform hashing in a set-based operation.

Index Creation and Final Merge

After hashing, created necessary indexes to optimize further queries.
Merged the hashed staging data into the main customer table using an efficient SQL query.

Performance Validation

Compared end-to-end load times before and after the optimization.
Monitored SQL Server resource utilization to ensure no bottlenecks occurred during the batch update.

Impact and Outcome

Performance Gain:

Reduced load time from 7–8 hours to 45 minutes (an 85–90% improvement).
Freed up critical resources for downstream processes.

Cost Optimization

Reduced reliance on KingswaySoft for CPU-intensive hashing, leveraging SQL Server’s native capabilities instead.

Scalability

The solution can handle larger datasets in the future with minimal changes, as SQL Server is better equipped for parallel operations than a record-by-record SSIS-based transformation.

Key Takeaways

Minimize Transformation During Data Transfer:
Focus on moving raw data as quickly as possible to the database, deferring transformations to where resources are optimized for such tasks.
Leverage Database Strengths:
SQL Server is designed to handle large-scale operations efficiently, making it suitable for batch hashing and updates.
Reduce Overhead in ETL Tools:
Offloading non-ETL-specific tasks like hashing from tools like KingswaySoft can significantly improve performance.
Iterative Optimization
By analyzing the bottlenecks (hashing in this case), tailored solutions can be developed to address specific issues without overhauling the entire pipeline.

Ready to reduce your technology cost?

Conclusion

This use case highlights how thoughtful analysis and leveraging the strengths of different components in the technology stack can dramatically improve data processing performance.

case studies

See More Case Studies

BI services

Optimizing Retail Intelligence and Data Management for a Renowned Lighting Solutions Provider with Maxnet

A leading high-quality lighting solutions manufacturer, faced challenges in managing and analyzing vast amounts of data scattered across different systems.

✔︎ Modern Technology
✔︎ IT services

Learn more

BI services

Healthcare Data Centralization and Intelligence Optimization

This case study illustrates how Maxnet implements data centralization and optimizes data intelligence using its enterprise data warehouse (EDW) and business intelligence (BI) solutions.

✔︎ Modern Technology
✔︎ IT services

Learn more

BI services

Implementation of End-to-End Data Warehouse system with business intelligence reporting.

Our core objective is to efficiently design, develop, and maintain critical database layers, including ODS, EDW, and Hadoop.

✔︎ Modern Technology
✔︎ IT services

Learn more

Detailed Use Case for Database Performance Improvement: Optimizing CSV Data Load from Amazon S3

Proposed Solution

Detailed Execution Steps

Amazon S3 to Local Storage

Raw Data Import

Batch Hashing

Index Creation and Final Merge

Performance Validation

Impact and Outcome

Key Takeaways

Ready to reduce your technology cost?

See More Case Studies

Optimizing Retail Intelligence and Data Management for a Renowned Lighting Solutions Provider with Maxnet

Healthcare Data Centralization and Intelligence Optimization

Implementation of End-to-End Data Warehouse system with business intelligence reporting.

Services

Company

LinkedIn

Twitter

Facebook

Simplifying IT
for a complex world.

Platform partnerships

Business Intelligence

Enterprise Data Warehouse

24x7 - EDW Production Support

Digital Marketing

Ecommerce - Marketplace Integration

Robotic Process Automation

Gen AI Staffing

Staff Augmentation

Predictive Analytics

Business Intelligence

Enterprise Data Warehouse

24x7 - EDW Production Support

Predictive Analytics

Robotic Process Automation

Digital Marketing

Ecommerce - Marketplace Integration

Gen AI Staffing

Staff Augmentation

Detailed Use Case for Database Performance Improvement: Optimizing CSV Data Load from Amazon S3

Proposed Solution

Detailed Execution Steps

Amazon S3 to Local Storage

Raw Data Import

Batch Hashing

Index Creation and Final Merge

Performance Validation

Impact and Outcome

Key Takeaways

Ready to reduce your technology cost?

See More Case Studies

Optimizing Retail Intelligence and Data Management for a Renowned Lighting Solutions Provider with Maxnet

Healthcare Data Centralization and Intelligence Optimization

Implementation of End-to-End Data Warehouse system with business intelligence reporting.

Services

Company

LinkedIn

Twitter

Facebook

Simplifying IT for a complex world.

Platform partnerships

Services

Business Intelligence

Enterprise Data Warehouse

24x7 - EDW Production Support

Digital Marketing

Ecommerce - Marketplace Integration

Robotic Process Automation

Gen AI Staffing

Staff Augmentation

Predictive Analytics

Business Intelligence

Enterprise Data Warehouse

24x7 - EDW Production Support

Predictive Analytics

Robotic Process Automation

Digital Marketing

Ecommerce - Marketplace Integration

Gen AI Staffing

Staff Augmentation

Industry Focus

Simplifying IT
for a complex world.