Statistical analysis via cloud computing
Why cloud computing?
For my research project on Antecedents of Network Shape in Online Communities, I’ve developed a simulation of the formation of online communication networks using the R open source statistics package. One challenge I’ve run into, though, is creating and analyzing simulation data is computationally intensive–it takes a long time to run on my machine.
Traditionally, the easiest solution to this problem would be to buy a new faster machine just for my research project. No doubt, for $1000 or so, I could get a fast workstation to crank through my simulation work. Unfortunately, I would be stuck with having that machine in only one place (my home office or my Temple office), which doesn’t fit my schedule well of splitting research work between two locations. Also, within a few weeks that $1000 machine would turn into an expensive dust collector–I only need this “extra horsepower” every now and again, certainly not every day. Finally, I’d much rather spend my computing budget exactly when I need it, not all up front on a machine that may become obsolete before I get my money’s worth out of it.
As it turns out, this is exactly the kind of situation where cloud computing is a perfect fit. I have a short-term need for a dramatic increase in computing power (e.g., scalability). I have no interest in buying, configuring, or owning more computers (e.g., I’d rather buy computing services). I’d really like to find a cheaper solution, too, as I don’t want to pay for a computer I’m only going to use for a short while.
How to get started: statistical analysis using R and the Amazon Cloud
I had my first hands-on experience with cloud computing this week using the R open source statistics package via Amazon Elastic Compute Cloud (EC2). There’s some great (free) tools available. It took some research to find an appropriate solution, and a fair bit of trial and error to get it working, but I’m now quite happy with the Biocep-R utilities.
First off, I never did get this complete set of tools working with Mac OS X (mainly, the Virtual R Workbench failed with a local R server in Mac OS X). All of the following instructions are for Windows. Happily, once I got it working with Windows I found I could manage the processes via my Mac.
Here’s the steps that worked for me to get Biocep-R working on Amazon’s cloud.
Step #0 – Follow steps to get started with Amazon EC2 (from http://biocep-distrib.r-forge.r-project.org/, steps covered later removed). Even if you plan on accessing your cloud computing resources from multiple computers, you only need to do this step once.
Getting Started with Amazon EC2
- Sign up for Amazon EC2 here
- Learn how to use Elasticfox to connect to your EC2 account, browse available AMIs (Amazon Machine Images ) and run AMIs from here
- Few issues like keys conversion for being able to ssh the virtual machines instances can be answered using EC2 getting started documentation here
The key items to configure in ElasticFox are credentials and an Account ID (both at the tool level). Using ElasticFox you’ll also need to setup a KeyPair and a Security Group.
Step #1 – Download and install Biocep-R. For Windows, to get everyone in one download use the option “R Workbench with R (2.8.0), with plugins (EC2/S3 monitors + examples) and with extensions (OpenOffice-based file converter) here” (it will actually install R 2.9.0) .
Step #2 - Confirm the Virtual R Workbench and the Fox Elasticnet plug-in both work on your machine. You can use Virtual R Workbench to connect to a local server to test it out. There’s a “Play Demo” option in the “R-Session” menu that works as a simple test of the environment. You can tell if the Fox Elascticnet plug-in is working if you are able to get a directory of available AMIs.
Step #3 – If you want to avoid learning “on the dime”, use the Biocep R Workbench with a local R server. You can install pacakges, upload and download files between your local machine and the “R server”, and practice executing commands. Once you’re comfortable with basic usage, you’re ready to go!
Step #4 – Start up an EC3 AMI.
Start the Biocep-R AMI ami-cd5fb9a4 : Ubuntu 9.0.4 Jaunty Jackalope / R version 2.9.0 / Scilab 5.1.0 /java version 1.6.0
- find ami-cd5fb9a4 (select region “us-east-1″, search with AMI id or with the keyword “biocep”, the AMI manifest is : biocep-ubuntu904-r290-j160-sci510-cologne/biocepimage.manifest.xml )
- Create a keys pair if you dont have one already
- Create a security group with one port of your choice open {my_port} : add a permission for a TCP/IP port {my_port} open to the network 0.0.0.0/0
- Run ami-cd5fb9a4 , choose your keys pair and your security group , insert the following to the field user data
start=true
port={my_port}
login={my_login}
pwd={my_pwd}
email={my_email}
workers={nbr_workers}- when the ami starts running, you receive an email with the URL to use to connect the Workbench to the ami
These parameters are all worth discussing in more detail. There are other AMIs with R already configured but I don’t think they have the biocep module loaded that supports remote usage via the Virtual R Workbench. Thus, the “biocep” AMI is the one to use for this configuration. When you start it up (”Launch instance(s) of this AMI”), the default Instance Type is “m1.small”. This is the smallest and least expensive option (roughly 8 cents an hour). The only other Instance Type supported by this AMI is the c1.medium type. This is because the image is a 32-bit image and the other Instance Types are 64-bit.
When you launch this AMI it is not immediately obvious that you need to choose your security group (which will appear under “Available Groups”) and choose the arrow button to enable it in the “Launch in” box. If you forget this step the AMI will launch under the default group and you will be unable to connect to your R server.
From what I can tell, the user values are used as follows: the port is any port number you choose (80 is used in the online example). The login and pwd value are for logging into the R server from the Virtual R Workbench (the online example uses “guest” and “guest”). Whatever email you provide will receive an update when the instance is available for usage. The email will include a really handy URL for automatically logging in to the R server.
I think the number of workers parameter allows you to have multiple “workers” within the R server. I set this as 1 and it worked fine.
Step #5 – Use the R Virtual Workbench to connect via URL to your R server. Congratulations, you are now ready to use R via Amazon’s EC3. You have embarked on the exciting world of statistical analysis via cloud computing.
Update: My first day of cloud computing cost me less than $2. I estimate I’ll be able to easily handle all of my immediate needs for less than $100 and quite possibly for less than $50.
5 tips for fixing your Facebook page
If you are unhappy (or just plain confused) with the latest changes to Facebook, you will appreciate this article at CNN.com helpful: 5 tips for fixing your Facebook page.
It describes how to hide friends’ posts (or, just hide a their posts from a specific application… just in case you’re not so interested in Mafia Wars or Farmville updates), how to decrease or increase the information you see in news feeds, and provides helpful tips on configuring privacy settings.
A Cautionary Tale: University Sues Student Blogger
Here’s a cautionary tale, News: University Sues Student Blogger – Inside Higher Ed:
Butler University has sued an undergraduate student for making libelous and defamatory statements about administrators on a blog he kept anonymously.
Details of the case became public last week when Bill Watts, an English professor at Butler, wrote a piece in the student newspaper and sent an e-mail to the university’s Faculty Senate in which he questioned “the practice of suing our own students for their utterances.” The e-mail provoked a written response from Bobby Fong, Butler president, who defended the lawsuit Tuesday at a Faculty Senate meeting, noting that “academic freedom does not provide protection for defamation and harassment.”
This is a good reminder that when you post to a blog (anonymously or not), you are still bound by laws of libel and defamation.
Reading through the rest of the story, it appears to me there’s plenty to critique about Butler University’s handling of the matter, but that doesn’t take away from the words of warning.
What is social media?
Elias Bizannes provides a round-up of definitions of social media in “Thank you 2008, you finally gave New Media a name.”
The shortest definition in his post is: “social media are primarily Internet and mobile-based tools for sharing and discussing information among human beings.” As he explains, for this definition to be useful, you also need more specificity on the definition and nature of “media” itself. He nets it out as: media is “Media is about communication of messages” and social media is open, many-to-many (not broadcast), and disruptive.
This elaboration does get past my recent assertion that social media predates the Internet (see “Interview with Prof. Steven L. Johnson“): “the party line telephone in my grandmother’s rural community was a social media.” Party lines are many-to-many but only partially open and, ultimately, not so disruptive.
The Social in Social Media
I ran across this interesting post today by Tom O’Brien at Movie Quest. Here’s a key highlight from 5 Things About Social Media::
- Social media is about human scale engagement – not technology
- SM is about relationships, not campaigns. Plan accordingly
- SM cuts across silos. It will involve marketing, product, customer service and legal (at least). Doing it well will require C level approval AND support.
- Success in SM requires putting the community’s motivations first. This is very hard for most companies to grasp. It is not about selling something, but about getting people to love you so they will do more business with you. A subtle, but important distinction. Put the community’s motivations first.
- Successful SM will connect to something people are already passionate about. Figure out what it is first.
In other words, social media is all about the social. Well, maybe not all about the social (there’s plenty of ways to get the tools, technology and marketing right or wrong, too), but if you don’t get the people parts right first, your social media efforts are sure to fail.
Setting up Technorati
Demonstrating what I’m going to be talking about in class today… I’m using this post to setup Technorati for this blog.
I would have done this much earlier, but Technorati kept rejecting my attempts to claim a blog at this domain.
I submitted a request via the online request form for a review… nothing happened.
Later, I realized the online form it tells you to fill out is a meaningless. If you want to “appeal” the categorization of your blog, you need to ignore the form and post the request to their forums.
That’s a pretty bass-ackwards way of doing things, don’t you think?
Teams For Group Project
MIS 3580 Social Media Marketing
Finally the groups have been formed and the topics for the project have been decided. The final groups and their topics are listed below.
Group 1:
Topic: Thyroid Cancer
Team Members: April Mauger, Morgan Jaffe and Kelly Cacciutti.
Blogsite: http://social.mis.temple.edu/mjaffe6
Group2:
Topic: Temple Student Lifestyle (food,housing, Study habits)
Team Members: Keith Hawkins, Vijay Naraynan and Jeff Pan
Blogsite: http://social.mis.temple.edu/collegelife/
Group 3:
Topic: Nike Skateboardings
Team Members: Lindsay Kaminsky, Adrian Gonzalez and Tavare Brown
Blogsite: http://social.mis.temple.edu/nikesb/
Group 4:
Topic: Healthy Eating on Campus
Team Members: Diajessa Hinton, Toai Nguyen, Rose Huddell and Marvin Kimbrough
Blogsite: http://social.mis.temple.edu/healthyeating/
Group 5:
Topic: Things to Do in Philadelphia for Hard-working Students (Temple)
Team Members: Jillian Howard, Joseph Deluca, Gboyega Adenaike and Jesse Merlin
Blogsite: http://social.mis.temple.edu/downtimeteam/
All the best to all teams and have fun working together.
Find me on Facebook and twitter
You can find me on Facebook and Twitter.
A special Twitter bonus. I’m using twitterfeed to announce new blog posts for both this blog and my Secrets of Web 2.0 Marketing course blog.
Directory of Social Networking Sites
Via Friends: Social Networking Sites for Engaged Library Services, I ran across this fascinating list of web-based Social Networks. If you’re looking for a directory of social networking sites, it is the most comprehensive one I’ve seen.
Adding up the estimated membership for the 246 sites, they come up over 1 billion total members.




