Bulk processing of pharmaceutical data

Health & Pharmaceutical Sector

datacenter.jpg

Customer

German pharmaceutical company of international importance.

Description

Big performance problem in data ingestion with Spark. Volume of several TB of data per day.

Results 

Complete redesign of the ingest pipelines, reducing the computation time from several days to just a few hours.

Technology

Spark with Scala for data processing. Flume and Sqoop for ingestion. HDFS storage available using Hive SQL engine. Big Data cluster with MapR technology.

Previous
Previous

User experience in mobile networks

Next
Next

Demand forecasting