Bulk processing of pharmaceutical data
Health & Pharmaceutical Sector
Customer
German pharmaceutical company of international importance.
Description
Big performance problem in data ingestion with Spark. Volume of several TB of data per day.
Results
Complete redesign of the ingest pipelines, reducing the computation time from several days to just a few hours.
Technology
Spark with Scala for data processing. Flume and Sqoop for ingestion. HDFS storage available using Hive SQL engine. Big Data cluster with MapR technology.