Call Us: +1 646-926-3881(USA) - Mail: contact@supstat.com

Jun 26-27, 2014 – Introduction to Data Science with R in NYC

You can either register from eventbrite or our school site NYC Data Science Academy.

Date: Thursday/Friday , June 26th and 27th, 2014

Time:  9:00am to 5:00pm

Location: 500 7th Ave, 17th Floor, glass door classroom, New York, NY 10018

NYC Data Science Academy, training subbrand of SupStat (Official Training partner with RStudio Inc) is hosting our two day Introduction to Data Science with R course in New York City this June. This is a two-day workshop, designed to provide a comprehensive introduction to R. We’ll get you programing and analyzing data with R in no time. All participants will receive a copy of all slides, exercises, data sets, and R scripts used in the course.

We will emphasize how you can get work done easiy with Rstudio IDE.

Course Instructor:

Vivian S. Zhang (CTO of SupStat, Organizer of NYC Open Data Meetup, Founder of NYC Data Science Academy)

Information about our school:

Check our past students’ excellent testimonial and their projects and blog posts after they finished our classes.

Our Data School has trained 96 students in the past 8 months, including topics in R, Python, Hadoop, D3.js, Tableau,etc.

Discount:

Discount pricing available for academics (33% off) and students (66% off). Space is limited, please write to Send Mail to confirm your eligibility.

Who should take this course?

This class will be a good fit for you if you are just starting with R or have dabbled in R, but wish to improve your skills. No prior experience with R or data science is required.

What will you learn?

Practical skills for visualizing, transforming, and modeling data in R. During this two-day course, you will learn how to explore and understand data as well as how to do basic programming in R. A full list of topics for each day is below.

What should you bring?

Be ready to learn. You need your laptop and the latest version of R. We also recommend downloading the Rstudio IDE, as it provides a great learning environment for beginners as well as tools for when you transition into an advanced user.

Overview

Are you interested in better understanding your data, and not so interested in mastering a programming language? Have you tried learning R from a book or website, but have been discouraged? If so, this is the course for you. We assume that you’ve never programmed before (although some experience doesn’t hurt), and we teach you the best tools to help analyze your data.

You won’t be a master programmer by the end of this two-day course, but through immersion you will have learned the basics of R’s syntax and grammar, and you’ll have started building an effective R vocabulary for visualizing, transforming, and modeling data. You will learn how to load, save, and transform data as well as how to write functions, generate beautiful graphs, and fit basic statistical models to your data. We’ll give you a theoretical framework to help you understand the process of data analysis, but our focus is on practical tools that you can use as soon as you get back from the course.

All techniques are motivated by real problems, and you’ll be exposed to a number of real datasets throughout the course. We alternate brief lectures with hands-on practice: you’ll get plenty of experience actually using R (not just hearing about it!), and there’s plenty of help available if you get stuck. The course concludes with a 90-minute data analysis project. You can use this as an opportunity to start using R with your data, or work on answering some of our questions about a dataset.

This tried and true course has been taken by over 200 students, from biologists to humanists, many of whom had never programmed before. This course teaches the basic skills needed by anyone seriously interested in data.


Day 1 – Getting started and working with data

Thursday, June 26th, 2013 9:00am to 5:00pm

An Introduction to R and data analysis - R is more than just a programming language. R is a statistical software application in its own right, an environment for interactive data analysis, and a community of passionate users. This orientation to the R language will help you get up and running.

  • How to download and update R and SupStat
  • How to find resources and help for R
  • Stages of data analysis
  • Best practices of data analysis

Visualizing data - R’s is well known for its beautiful graphics. R packages, like `ggplot2`, provide an expressive and logical language for building clear and effective data visualizations.

  • Visualize the distribution of a variable
  • Exploring and plotting relationships between variables
  • Display very large data sets through graphs without over-plotting
  • Use best practices for Exploratory Data Analysis in R code

Working with data - R is a programming language with a purpose: to analyze data. Learning how R stores and handles data will help you apply R to any data source.

  • Loading different data formats into R
  • Working with factors in R
  • How to clean poorly formatted data
  • Saving your data

Manipulating data - R’s methods for data manipulation make it easy and fast to extract information from data sets and to prepare raw data for analysis.

  • Subset, transform, summarize, and reorder data sets
  • Perform targeted, groupwise operations on data
  • Join multiple data sets together

Day 2 – Programming and modeling in R

Friday, Jun 27th, 2013 9:00am to 5:00pm

Programming in R - Many people use R as an application, a sort of statistical calculator, but R is also a programming language. Once you learn to program in R, you will be a more versatile and capable data analyst. You’ll learn to write code that provides the precise solutions you are looking for.

  • Create an if else statement
  • Write and optimize for and while loops in R
  • Use best practices for programming in R

R functions - Functions allow you to save your code for later or to share it with other R users. Knowing how to write a function will also streamline your workflow. Functions give code a more efficient structure that avoids duplication and aids debugging.

  • Organize a problem into a series of functions
  • Write a function in R
  • Apply best practices for writing functions in R

Simulation in R - Simulating data provides a way to test hypotheses and discover the uncertainty in your estimates.

  • Generate random numbers in R
  • Visualize uncertainty with bootstrapping in R
  • Construct a confidence interval with bootstrapping in R
  • Test a hypothesis with a permutation test in R

Modeling in R - R excels at statistical analysis and modeling, but its methods for modeling may seem unintuitive at first.

  • Write a formula in R
  • Fit a model to data in R
  • Compare models
  • Explore data sets with models

Disclaimer:

In certain cases, we may need to cancel this workshop due to circumstances beyond our control or otherwise. If this happens, SupStat will refund all registration fees for those who signed up. SupStat is not responsible for any related expenses incurred by registered attendees (including but not limited to travel and hotel expenses).

Refund policy:

Until Jun 15th, 2014 – Full refund, less 10% of registration fees
Jun 15th, 2014 to June 21st, 2014 – 50% refund of registration fees
Jun 22nd, 2014 and after – No refund available

Money-back guarantee:

All public workshops hosted by SupStat with a no-questions-asked money-back guarantee.

The 7th China R Conference in Beijing

The 7th China R Conference in Beijing was held on May 24th ~May 25th in Renmin University of China. SupStat is really happy and honored to sponsor and attend this meeting.

ChinaR

This is the largest ever R conference in China with 1814 registrations online and even 50 more requests of attendance with the help of special connections and friendship, after the online application system closed! At last the free and limited seats had to be distributed on a first-come-first-served basis.As we could see from the below, people sit on the floor, lean on the wall, wait at the door, for this free and open R party!

crowd

There are many interesting talks here! The first day’s meeting was held in one of the largest most luxury rostrum(as a registration-free conference, thanks to the sponsorship of SupStat, Revolution Analytics, RStudio and so on).  Hadley Wickham’s talk about developing R packages and David Smith’s talk about How the growth of R helps data-driven organizations succeed served as the beginning part.  Dr. Kai Yu,Head of Institute of Deep Learning at Baidu, Dr. Ming Zhou, Principal Researcher of Natural Language Processing at Microsoft Research Asia and  Dr, Hansheng Wang, the statistic department head of Peiking University, also talked about their recent work and research projects.

Hadley

david

In the afternoon, all attendance listened enthusiastically to the company lightning part and the discussion forum part for Big Data and data science education.

SONY DSC

forum  

The next day’s meeting is divided into 3 parallel sessions; A on visualization; B for Big Data; C for R integration and others:

  • Session A(visualization) talks about ggvis by Hadley Wickham, recharts by the author Dr.Zhou and so on.
  • Session B(Big Data) talks about RHadoop, R-Web, rARPACK for SVD in ultra large matrix and the big data industry in China.
  • Session C(R integration and others) talks about R with python, R with Office , R in advertising, R in data mining, R in biology , Psychology and pharmacy  research.

Based on Beijing, the 7th China R conference attracted people all around the mainland China, Hong Kong, Taiwan and even outside China. Drawing from the registration information we could know that R users in China are mostly working in networks, IT, biology, finance and education. They are mostly interested in data mining and machine learning, data visualization and data solutions.  

Ruserindustry

The R users attended the China R conference in Beijing is increasing amazingly! How about the next year? Ruser

Install RStudio Server on centOS6.5

      My system is 64-bit centOS 6.5. The 64-bit version then is 0.98.766 and the following error  appeared when installing RStudio Server preview version:

[root@supstat download]# rpm -ivh rstudio-server-0.98.766-x86_64.rpm
error: Failed dependencies:
    libcrypto.so.6()(64bit) is needed by rstudio-server-0.98.766-1.x86_64
    libgfortran.so.1()(64bit) is needed by rstudio-server-0.98.766-1.x86_64
    libssl.so.6()(64bit) is needed by rstudio-server-0.98.766-1.x86_64

Refer to an article on stackoverflow:

yum install libcrypto.so.6 -y
yum install libgfortran.so.1 -y
yum install libssl.so.6 -y
yum install openssl098e-0.9.8e -y
yum install gcc41-libgfortran-4.1.2 -y
yum install pango-1.28.1 -y
 
wget ftp://rpmfind.net/linux/centos/6.5/os/x86_64/Packages/compat-libgfortran-41-4.1.2-39.el6.x86_64.rpm
rpm -Uvh compat-libgfortran-41-4.1.2-39.el6.x86_64.rpm
rpm -Uvh --nodeps rstudio-server-0.98.766-x86_64.rpm

Completion of the above steps is still not enough. Running rstudio-server verify-installation still reports errors.

The problem lies in the lack of library files, but there is indeed a file as mentioned in the error report  under /usr/lib. Then let’s check the library files under / usr/lib64 :

[root@supstat lib64]# ll libcrypto*
lrwxrwxrwx 1 root root      19 Apr  9 12:15 libcrypto.so -> libcrypto.so.1.0.1e
lrwxrwxrwx 1 root root      19 Apr  9 12:15 libcrypto.so.10 -> libcrypto.so.1.0.1e
-rwxr-xr-x 1 root root 1950976 Apr  8 10:42 libcrypto.so.1.0.1e
[root@supstat lib64]# ll libssl*
-rwxr-xr-x. 1 root root 250168 Feb 11 21:01 libssl3.so
lrwxrwxrwx  1 root root     16 Apr  9 12:15 libssl.so -> libssl.so.1.0.1e
lrwxrwxrwx  1 root root     16 Apr  9 12:15 libssl.so.10 -> libssl.so.1.0.1e
-rwxr-xr-x  1 root root 441112 Apr  8 10:42 libssl.so.1.0.1e

We could find there is no libcrypto.so.6 and libssl.so.6.

Attempting  to create a soft link library files:

cd /usr/lib64
ln -s libssl.so.10 libssl.so.6
ln -s libcrypto.so.10 libcrypto.so.6

rstudio-server verify-installation  test passed.

Firewall settings

In the browser, when entering http:// <server IP>: 8787, you will find it impossible to get accessed.  This  is due to built-in firewall policy in centOS which have not added 8787 port. Next, modify the firewall configuration file:

vi /etc/sysconfig/iptables

Add -A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT after the following sentence:

-A INPUT -m state --state NEW -m tcp -p tcp --dport 8787 -j ACCEPT

So that you can access it!

rstudio-server-login

User Settings

Sometimes we want to restrict user access to Rstudio Server, then it is necessary to modify the configuration file /etc/rstudio/rserver.conf, and add the following sentence:

auth-required-user-group=rstudio_users

Then add users groups rstudio_users and user supstat in the command line:

groupadd rstudio_users
useradd supstat 
usermod -a -G rstudio_users supstat 
 
# user password setting
passwd supstat
 
# if you want to add sudo permission for some users ,please refer to #http://www.getroad.cn/blog/?action=show&id=801
 
chown -R supstat:rstudio_users /home/supstat

Then we can use supstat account to login in the Rstudio Server :

rstudio-server

View the process:

[root@supstat R]#ps aux|grep rstudio-server
498      19292  0.1  0.3 212788  3980 ?        Ssl  07:30   0:00 /usr/lib/rstudio-server/bin/rserver
supstat  19307  0.5  4.3 568932 43956 ?        Sl   07:31   0:03 /usr/lib/rstudio-server/bin/rsession -u supstat
root     19414  0.0  0.0 103248   864 pts/0    R+   07:43   0:00 grep rstudio-server

We can see there are two rstudio-server-related processes: one is rserver system, and the other is the user supstat ‘s  rsession.

System configuration and system management

 

For installing under ubuntu you can refer to Zhang Dan’s blog

 

This article is a SupStat original article and shall not be reproduced without permission. For reprint please email contact#supstat.com(# replaced by @)

Node.js Express 3.0 development Basic

Introduction
Through Node.js, JavaScript become ideal tool for server applications. And through jQuery, JavaScript is excellent tool for browser development. Recently we learnt about the Express 3.0 in Node.js. The book we were referring to was talking about Express2.x, so we were meeting some difficulties when learning Express3.0. We would love to share with you how we overcame them.
nodejs-1

 

<<Node.js toolkit>> ©2014, Conan Zhang and Vivian S. Zhang. All rights Reserved. The corresponding post written in Chinese can be found at Conan Zhang’s blog. Please contact vivian.zhang@suptat.com if you are interested to publish it in English.

<<Node.js toolkit>> will introduce you to use Javescript as server side script and use node.js framework to develop website. Nodejs framework is based on the V8 engine which is the fastest Javascript engine. Chrome browser is also based on V8. It is very smooth even when you open 20 to 30 pages simultaneously. Node.js standard web development framework, Express, can help us quickly build web sites. Developing website by node.js is more efficient than doing it by PHP and require less steep learning curve. It is ideal for building small sites and personalized web sites. We want to introduce you a lot of handy tools to reduce your workload and make elegant and beautiful site easily.

 

Content

We will focus on the Express 3.0 Framework, and also related things as Mongoose, Ejs, Bootstrap.

  1. Build a project
  2. Directory Structure
  3. Express3.0 Configuration
  4. Ejs template
  5. Bootstrap Framework
  6. Routing Function
  7. Use of Session
  8. Page Notification
  9. Page Visit Control

Read more »

Series of our new book — Node.js Toolkit

<<Node.js toolkit>> ©2014, Conan Zhang and Vivian S. Zhang. All rights Reserved. The corresponding post written in Chinese can be found at Conan Zhang’s blog. Please contact vivian.zhang@suptat.com if you are interested to publish it in English.

<<Node.js toolkit>>  will introduce you to use Javescript as server side script and use node.js framework to develop website. Nodejs framework is based on the V8 engine which is the fastest Javascript engine. Chrome browser is also based on V8. It is very smooth even when you open 20 to 30 pages simultaneously. Node.js standard web development framework, Express, can help us quickly build web sites. Developing website by node.js is more efficient than doing it by PHP and require less steep learning curve. It is ideal for building small sites and personalized web sites.  We want to introduce you a lot of handy tools to reduce your workload and make elegant and beautiful site easily.

 

Book Content:

Chapter one: get started
Read more »

SVD and Image Compression(by Shiny! Yes!)

We have a brilliant Shiny application demonstrating Image Compression with Singular Value Decomposition. One could get a sense ofm SVD quickly from it: https://yihui.shinyapps.io/imgsvd/.

ImgSVD made by Nan Xiao, who is Data Analyst Intern at SupStat and Yihui Xie who is Software Engineer at RStudio, also advisory Data Scientist at SupStat Inc.

The interface looks like the following:

https://yihui.shinyapps.io/imgsvd/

 Singular Value Decomposition is a complex technique. It is a matrix factorization and is widely used in dimensionality reduction. Its result is an approximation of a matrix, and it is flexible because we can determine the degree of the approximation by the parameter k. Besides, image compression is also focusing on approximation. We can use this method for image compression if we regard an image as a matrix. Following is a demonstration of the result of our algorithm.

Original image is svd11

When we set k=1svd12

Read more »

Upcoming NYC R Programming Classes

It is our pleasure to once again offer the intensive R beginner level course for the third time! Beginning this Sunday, the 35 hour course will walk you through the basic operations and characteristics of R, all the way to having a firm understanding of data manipulation and visualization.Also launching this weekend are two brand new courses, Data Visualization for D3 and Data Science for Python, both for the beginner level.

R users will rule the world :) Make sure to sign up today

Taught by preeminent data scientists in New York City, these beginner NYC Data Science Academy courses are the best introduction to the exciting world of R, open data, and statistical science.

If interested, please read the course descriptions below and RSVP today!

Read more »

Who We Are

Written by Vivian Zhang, Edited by Jennifer Morris

SupStat was born in 2012 out of the collaboration of 60+ of individuals who met through a well known non-profit organization in China, Capital of Statistics.The SupStat team met through analytical volunteer work and through various collaborations on R packages. SupStat’s founders and team members are also a significant driving force in the New York City data science community through the NYC Open Data Meetup.

SupStat offers consulting services in the areas of R development, data visualization, and big data solutions. We are experienced with many technologies and languages including R, Python, Hadoop, Spark, Node.js and etc. We are official partners with Revolution Analytics, Transwarp and RStudio.

SupStat was founded with an intention of building the New York, Beijing and Shanghai data science communities. As of Mar 2014, the NYC Open Data Meetup had 1400 members. We host workshops of 15 to 150 people and host one to three events per week which typically see 100 to 150 people per event. U.S. sponsors have included McKinsey & Company, Thoughtworks, and others. Our Beijing monthly tech event sees over 500 attendees and gains attraction from event co-hosts including Baiyu, Youku and others.

In addition to data science community building events and workshops, SupStat works to train more individuals in data science through NYC Data Science Academy and I Smart Data. Courses offered include R (beginner, intermediate, advanced), Data Science by Python (beginner, intermediate, advanced), GitHub, Node.js, D3.js, Hadoop (beginner, intermediate). We strongly encourage our own team and engineers and data scientists in the SupStat, NYC Open Data, NY Data Science Academy, and I Smart Data to offer workshops and courses. It no only scales the impact we can have with data, but it also offers each presenter and teacher and opportunity to further master their domain(s) of expertise.

For more information on SupStat or to discuss your data needs, please contact us.