A very insightful interview conducted by Darren Cunningham with Naresh Govindaraj, the brainchild behind Informatica Cloud’s Integration Templates. Learn why templates are important in today’s world of mashups and composite applications.
A blog post I wrote on the Informatica Perspectives blog on how iPaaS impacts cloud application adoption.
I recently attended a joint panel webinar between Informatica Cloud, StrikeIron, and Address Doctor (a subsidiary of Informatica Cloud) discussing Informatica Cloud’s latest service from its Winter ’12 offering, the Contact Validation Service.
Cloud data integration has really taken off in the past couple of years with the proliferation of SaaS applications, and the need for these applications to access and interact with data stored in on-premise systems. But data integration alone only solves part of the problem. In order to maximize your “Return on Data”, you need to ensure that you’re dealing with high quality and trustworthy data. As Ted Friedman, a Distinguished Analyst from Gartner aptly points out,
“Organizations cannot be successful in their data integration work unless they have a very strong focus on data quality built in. That’s because it’s not only about delivering stuff from here to there. You also have to make sure you’re delivering the right stuff.”
I also learned of the 1-10-100 rule – a Bloor Research whitepaper that states that it takes $1 to verify a record as it is entered, $10 to cleanse and de-dupe it and $100 if nothing is done, as the ramifications of the mistakes are felt over and over and over again.
Contact Validation ensures an abundance of benefits for all departments and industries. Marketing functions can benefit from higher campaign response rates by filtering out leads with incorrect contact information – in fact, a BtoB Online article claims that validating and managing data in the earliest stages of collection can lead to better lead scoring and lift conversion rates by about 25% between the customer inquiry stage and the point where marketing qualifies the leads.
In addition to marketing, warehouse managers can calculate correct shipping costs and legal departments can ensure regulatory compliance. Industries such as Insurance can use geocoding information to check whether properties are near a fault line, flood plain, or landslide zone, while banks can use contact validation in their deduping process.
With 30% of people changing their email every year, invalid emails can trigger bulk email flags, causing future emails to be filtered as spam, and result in the sender being blacklisted by ISPs and spam filters. This is why email validation is so important. The $16,000 per violation fine from the FTC also underscores the importance of phone validation against the Do Not Call registry.
I also learned that Contact Validation fits into the Data Quality spectrum, and is only a small part of it. But before even embarking upon a data quality project, it’s important to first of all integrate all of your data. Then, the next step is to cleanse, enrich, & augment contact data as a first step towards achieiving higher data quality. Once this is done, parsing and standardization, deduping, and finally monitoring and reporting round out the rest of the data quality activities. Moreover, these data quality activities must be done for all data types, and not just contact data.
With data quality processes properly implemented, a company can then finally take it to the next level and develop data governance policies, and hierarchy management for a full-fledged Master Data Management Solution. You can view the full replay of the webinar below.
- IDC predicts that 80% of new software offerings will be available as cloud services and that by 2014, over one-third of software purchases will be via the cloud
More than 85% of Fortune 500 organizations will fail to effectively exploit big data for competitive advantage through 2015
33% executives in a survey conducted by Saugatuck Technology identified integration as their top concern regarding SaaS deployment and use
On the SaaS vendor side, a THINKstrategies survey found that 88% of SaaS companies identify integration as important in winning new customers and a common sales hurdle.
for cloud analytics, data-as-a-service (a.k.a. DaaS), and PaaS.
I was fortunate enough to attend Intel’s Developer Forum in mid-September at the Moscone Center in San Francisco to attend most of the cloud-related sessions. It was amazing to see the various innovations displayed by Intel IT with regards to cloud architectures and how they make business more agile. Amongst my findings:
- Intel IT had narrowed its service provisioning time from 90 days in 2009 to 14 days in mid-2010 and have a goal of 3 hours by end of 2010
- They are doubling their rate of virtualization and hope to have 70-80% of their servers virtualized by 2011
- A ‘Hosting Automation Framework’ allows users (consisting of three audiences – App Developers, IT Customers, and IT Operations) to access services through a self-service portal and for the cloud architects to view resource usage statistics through BI-enabled dashboards
- Intel’s Conceptual Cloud Architecture placed an emphasis on certain components within the PaaS layer (specifically the database, analytics, reporting, and the workflow) as well as VM isolation, load balancing, and High Availability at the IaaS layer.
- The physical-to-virtual sizing strategy depended very much on the type of application being run. A small workload consisting of a basic web app server could achieve a virtual machine density of 22:1 using Intel processor Xeon 5500 based servers. A medium workload consisting of SQL database servers and Java applications could achieve a ratio of 11:1, while a large app consisting of several stacked applications could yield a 5:1 ratio.
- Private clouds are recommended for those applications that require high levels of security, governance, and interoperability such as ERP and BI, while public clouds are recommended for workloads that require rapid deployment, reduced capital expenditure, and external vendor expertise such as batch jobs, and HR apps.
- Intel’s cloud vision consists of a world where the cloud is federated, automated dynamically, and client aware to secure the most optimal experience across the client continuum.
- With the increase in cloud traffic, also comes an increase in cloud storage. Data was described into four different categories, each with its own associated CAGR by 2014: Structured Data (Traditional Enterprise DB, 23.6%), Replicated Data (Backups & Data Warehouses, 24.2%), Unstructured Data (Archives, 54.8%) and Content Depots (Web, email, document sharing, and social network content, 75.6%).
With many pilot cloud projects gathering steam, organizations are evaluating transitioning their IT systems to a cloud-based architecture. However, such a full-scale move must take into account security risks, lock-in risks, and cost-benefit analysis over the lifetime of the investment. An InformationWeek Analytics report outlined a comprehensive survey of 393 individuals within various companies, 28% of whom had more than 10,000 employees. Amongst the many findings within the report, the most interesting ones were:
- 34% of respondents involved in the cloud used it for SaaS (applications delivered via the cloud), 21% for IaaS (storage or virtual servers delivered via the cloud), 16% for PaaS (web platform delivered via the cloud), and 16% for DaaS (Data-as-a-Service for BI and other data lookup services delivered via the cloud)
- 29% were not using the cloud at all
- More than one-third claimed to build in 31% or more excess server and storage capacity for non-cloud computing systems
- 73% cited “Integration with Enterprise applications” and 69% cited “Cost of Hardware and Software” as factors when choosing a business technology
- Almost 92% exhibited some sort of likelihood to comprehensively carry out an extensive ROI analysis of the expected lifespan of a cloud computing project
- 46% said that their ROI calculation would span 3-5 years
- 45% stated that “elasticity” is frequently or often required
There was also a feeling amongst respondents that cloud computing works for commodity applications but that complex integration requirements make costs skyrocket. The major sources of cost savings touted by cloud proponents involved three areas: efficiencies as a result of economies of scale, use of commodity gear and elasticity.
This last area of elasticity is worth exploring further, especially in light of the number of respondents claiming to require excess capacity for their non-cloud computing applications, and the assertion that complex integration requirements are increasing the costs in the cloud. Elasticity refers to the ability to scale up or scale down on storage resources on the fly. But the reality of elasticity “on-demand” is that most major software vendors don’t provide the ability to add CPUs without additional costs, and coding applications that appropriately scale up are difficult. Thus, given the above data about the necessity of elasticity and the large percentage of companies that conduct detailed cloud ROI analysis, it is evident that these two factors are correlated.
Increasing the ROI of a Cloud Deployment
In order for CIOs to see more of an ROI from deploying applications in the cloud, several things must happen:
- Data center automation software must reach a level of sophistication where they are able to automatically coordinate tasks between on-premise and cloud applications to optimize elasticity
- Software vendors must allow for special pricing for cloud providers so that these savings can be passed onto customers – a way to allow this is to ensure a multi-tenant architecture consisting of customers that use the same on-premise software as is used in the cloud-based edition by the respective provider
- Ensure that the web platform used for PaaS purposes by the customer is compatible with the SaaS applications that they subscribe to, in order to enable any custom widgets that may need to be written
- Massive improvements in the “converged fabric” architecture that brings together servers, storage, and networking, so that pools of additional capacity are easily available where elasticity is needed.
During the recent recession, Cloud Computing was touted as a new model for IT to adopt, in order to cut operational costs and extract maximum efficiencies out of their software. 2010 was supposed to be the ‘Year of Cloud Computing’ yet adoption still remains slow.
A recent InformationWeek article referenced a study conducted by Avanade which showed that 91% of U.S. respondents understood the term Cloud Computing while only 61% of respondents from the rest of the world understood it. Even more surprising was the fact that over half of U.S. respondents claimed to be using a combination of internal IT systems and cloud services (in other words “hybrid clouds”), while those who didn’t adopt any form of cloud computing cited security and control as their primary reasons for not doing so.
The unusually large number of ‘cloud computing adopters’ leads one to believe that the respondents considered web-hosting, salesforce.com, and other SaaS-type offerings to be cloud computing as opposed to pure-play cloud providers such as Amazon EC2, Heroku, and Google AppEngine. This leads us to the first reason for slow cloud computing adoption:
Misunderstanding Cloud Computing
The definition of Cloud Computing has converged on three distinct layers, each of them mapped appropriately to the ‘old’ traditional datacenter model of hardware, OS, and application:
Infrastructure-as-a-Service (IaaS): This includes servers, storage, and networking hardware stored remotely and delivered on an as-needed basis in the form of CPU cycles or data. Amazon EC2 and GoGrid are prime examples of IaaS providers.
Platform-as-a-Service (PaaS): This consists of a complete platform upon which to build your custom applications. APIs, database development, storage, and testing are provided as well. Microsoft’s Azure and salesforce.com’s force.com platforms are examples of early PaaS providers.
Software-as-a-Service (SaaS): This consists of applications delivered over the web and accessed through an internet browser. salesforce.com’s CRM modules, Gmail, and Workday are all examples of SaaS providers. However, as you’ll see below, there is a fine difference between a SaaS solution, and a SaaS cloud-computing solution.
While the above definitions provide a basic foundation for understanding what cloud computing is, they still do not enable decision makers to understand the myriad of complexities involved with pushing the ‘go’ button when it comes to migration, and deployment. I found this useful in-depth InfoWorld Cloud Computing Deep Dive report, which addresses all the ‘middleware’ components needed for a successful cloud computing migration, amongst other issues. One of InfoWorld’s main cloud computing bloggers, David Linthicum, wrote a book called Cloud Computing and SOA Convergence in Your Enterprise: A Step-by-Step Guide, which outlines the 11 categories of Cloud Computing. I’ve reproduced the image from the InfoWorld Report below:
Although the above topology adds more granularity to the various components of cloud computing, it is sometimes too all-encompassing. For instance, the Application-as-a-Service segment (a.k.a. SaaS) consists of any software delivered over the web. But to be a true cloud-computing solution, I believe that such SaaS solutions must be able to not only integrate well with on-premise software but also with other SaaS solutions that exist on some other platform.
Apart from understanding what cloud computing really means, the next biggest impediment towards adopting it is:
Security in the Cloud
The vast majority of enterprises who have taken to the cloud have done so in the area of non-critical business applications. However, to truly realize the full benefits of cloud computing, enterprises must be able to consume their mission-critical business applications in the cloud, and be able to transition seamlessly between their on-premise applications and the cloud. An old Gartner report almost two years ago, summarizes seven main security risks of cloud computing. The seven risks outlined were:
1) Privileged user access (what controls are in place over the administrators at the service provider who have access to your critical data)
2) Regulatory compliance (what kind of external audits and security certifications has the provider gone through)
3) Data location (what country is the data stored at and will privacy of customers’ data be guaranteed at this location)
4) Data segregation (data in a cloud datacenter is typically in a shared environment. What encryption schemes are there to ensure that private data is not delivered to another customer by mistake)
5) Recovery (what disaster recovery mechanisms are there for backup of data)
6) Long-term viability (what exit or continuation strategies are available in case of acquisition or bankruptcy of the provider)
The above list though, is not comprehensive. Moreover, current security solutions in the cloud are merely limited to security vendors that have SaaS extensions to their existing software. Security issues around protecting the platform in the cloud have not been addressed yet. A nightmarish security scenario would involve a hacker exploiting vulnerabilities in the force.com or Azure platform, and the virus quickly spreading to any applications that are run off it. Such a virus could then quickly proliferate its way to all customers using these applications. If you thought any of the MyDoom viruses of 2004 caused havoc, a virus of this scale through a cloud computing platform would bring significant business disruption. PaaS vendors such as Microsoft and salesforce.com need to assure customers of in-built anti-virus mechanisms to protect applications that run on their platforms.
Microsoft loses millions of dollars every year to software piracy. The bulk of this loss can be attributed to illegal copies of its two bulwark products, MS Office, and XP, taking place mainly in developing nations such as India and China.
Henry Chesbrough, the Executive Director of the Center for Open Innovation at the Haas School of Business at UC Berkeley, recently wrote about why Microsoft should welcome piracy in China. His reasoning focuses on the technology lifecycle and why it’s important to let new technology thrive in the early stages, even if it’s unlicensed.
Just as an aside, the technology lifecycle consists of 4 stages: Creation, Growth, Maturity, and Decline, and these 4 stages are shaped almost like an S-shaped curve. When a new technology is created, it gradually rises in its Creation stage until it reaches the Growth stage, where it hyper-exponentially takes off. The Maturity stage represents the peak of the curve and maximum market saturation after which the technology declines.
Chesbrough claims that given the currency differential between developed and developing countries, Microsoft should only capitalize on IP protection when technology reaches the Maturity stage, otherwise, the newly created middle class in China will simply use open source alternatives. George Ou of ZDNet disagrees with this and calls Chesbrough irresponsible for condoning piracy. As an alternative, he suggests selling Office and Vista for $3 each and ensuring that OEMs localize those particular versions of Vista and Office to be restricted to running in China. As an added incentive, he claims that people in China will want to buy these cheap legal copies because the illegal ones are full of malware.
I agree with George that Chesborough was wrong in condoning piracy but disagree with his business model. I think that a $3 per copy fee in China versus hundreds of dollars per copy in the developed nations will only cause resentment in the developed world and serve to push them away towards open source alternatives.
What do I think the answer to Microsoft’s piracy worries in China is? Facebook.
Simply put, Facebook is the only social networking site out there today that validates the authenticity of a user by requiring him to provide a work or university email address. This minimizes the amount of multiple bogus accounts.
Here are the potential stepsMicrosoft could take towards diminishing piracy in China by leveraging Facebook:
- Let Microsoft Popfly virally grow within Facebook (Popfly is an application consisting of pre-programmed blocks that lets non-techies create applications content for Facebook)
- Since the Facebook community is primarily full of college students, by using Popfly, a Facebook college user could leverage such content for a class project.
- As an example, imagine a class project where a business school student has a marketing research project to find out whether consumers prefer fast-food outlets (where you’re virtually expected to do a take-out) for lunch or “quick-service” outlets (such as Quiznos, where you can spend some “down-time” sitting and eating your food in a relaxed atmosphere)
- The student could conduct a poll amongst Facebook’s 30 million users
- He could then narrow this demographic information right down to the zip code (or whatever it is in China)
- He could then further overlay this information with Quiznos, McDonald’s, and Subway outlets in each of the zipcodes surveyed and apply a “classification scheme” where he sets Quiznos with a “quick-service” tag and McDonald’s and Subway with a “fast-food” tag
- He would now have hard quantitative information to report for his class project
- However, he now has to present his findings in the form of PowerPoint to his prof, and before that, maybe perform some more trending and analysis using Excel
- This is where if he wants to export this data from Facebook to the Office software installed on his laptop, Microsoft would perform a license check to verify that the copy is a legal one
- If it’s illegal, the user would be unable to export this data and is faced with two choices:
- Pay the full cost of a student version of Office or
- Pay the full cost of a regular consumer version of Office but benefit from a “reduced cost advertising model”
- In this model, the Facebook user would be subject to viewing targeted ads based on the interests filled out in his Facebook profile
- A monthly discount would be paid to the user based on the ads viewed, number of ads clicked upon, and services or goods actually bought from the advertiser. As an added incentive, the cost of such a service or good would be a lot cheaper through Facebook than the street price or regular e-commerce price. Furthermore, a certain percentage of these already discounted products would be applied to the monthly discount on the price he paid for Office
- This advertising model would apply for an entire year so it’s to the user’s benefit to conduct all his transactions through Facebook to derive maximum savings on his Office purchase. Of course, if the total value of these discounts, ad clicks, and ad views equals the price of Office, then the monthly discount stops (i.e. the limiting factor is the zero value).
Microsoft would obviously first have to lobby the Chinese government to ensure that Facebook adoption increases in universities across China. The Chinese government, being one that takes great pride in monitoring the activities of its citizens will only be too happy to oblige.
The above mentioned model could easily be replicated in the enterprise, small-business and mid-market sectors as well (using work emails) which is where a lot of the piracy also takes place (I’m assuming of course that multinational enterprise companies conform to legal purchases).