Analytics

Data

Analytics

Data-Analytics04.jpg

Analytics

Self contained architecture that enables nontechnical users to autonomously execute full-spectrum analytics  is the future of BI...

 

 

Traditional BI platforms require IT-produced analytics content, specialized tools and skills, and significant
upfront data modeling, coupled with a predefined metadata layer to access their analytics capabilities.

Enterprise data quality from disparate sources, losing analytical context in relationship to each other, and lacking  single source of truth about data continue to pose challenges with operational and transformative outcomes to uncover real value of the analytics 

 

Develop self contained architecture that enables nontechnical users to autonomously execute full-spectrum analytics to develop use cases, measure KPIs and derive business value by enabling business users to access, ingest, and prepare data for interactive analysis and collaborative sharing of insights.

Create unified data model for the enterprise to ensure correct and complete aggregation of data and its quality. ​​For example in healthcare, a holistic view of the patient experience over time and across all sites and episodes of care will require connectivity to hundreds of data sources, including EHR, unstructured text, imaging, audio scribes, billing and claims, etc., in order to establish a longitudinal patient record with 360-degree view of the care episodes. We develop adequate proof of concept, proof of technology, and proof of value for analytical models ahead of the analytical planning and execution.

 

We have expertise reducing exponential computation time required for big data analytics to linear times

Proprietary variable selection engine to identify statistically significant variables

Profiling and predicting algorithms with ensemble techniques for predictions from different models 

Descriptive analytics to classify data into clusters, and filter correlated variables 

Machine learning techniques to refine  variable list further to arrive at the final model

 
 
 

HADOOP PLATFORM

We recommend

  • Largely open source technologies for reliable, scalable, distributed, parallel computing involving Hadoop distributed file system (HDFS) on Cloudera

  • Application programming model (Spark, R, and Java), distributed database (Hbase), and program scheduling packages within Hadoop.

  • Commercial softwares will be used where appropriate for better performance reasons. E.g. in situation to provide better visualizations of high dimensions data, commercial  and client provisioned analytical tools will also be used.

STATISTICAL OPEN SOURCE TOOLS

  • R (comes with R-Studio), Apache Spark

  • Python (comes with Jupyter anaconda),

  • pyspark

HADOOP TOOLS

  • PIG, HIVE for Data Processing Services using Query

  • HBase for NoSQL Database

  • Mahout, Spark MLlib for Machine Learning

  • Apache Drill for SQL on Hadoop

  • Flume, Sqoop for Data Ingesting Services

  • YARN for Yet Another Resource Negotiator

  • MapReduce for Data processing using programming

  • Spark for In-memory Data Processing

  • Oozie for Job Scheduling

  • Zookeeper for Managing Cluster

  • Ambari for Provision, Monitor and Maintain cluster

  • Solr & Lucene for Searching & Indexing

VISUALIZATION TOOLS

  • Tableau

  • Spotfire, DataHaiku