Interview: Inside the Social Data Economy with IBMer Inho So Chu

In case you haven’t heard, we’ve shifted from an information economy to a data economy.

What’s the difference?

There’s simply too much information from too many sources for anyone to realistically gain insights manually. So if you want to leverage information to inform economic, social or political decisions, you need a way to analyze data computationally.

That’s what Big Data’s all about, and it’s IBMer Inhi Cho Suh’s specialty. In this interview (sponsored by IBM) Inhi discusses what Big Data is, why it’s important to organizations and how it impacts business decision making, government surveillance and personal privacy.

Eric: You’re a martial artist?

Inhi: What I did was a Korean martial art called tang soo do. It is one of three Korean styles. A lot of people know Taekwondo, Tang Soo Do and Hapkido, those are the three primary Korean martial arts styles. Tang Soo Do means in Korean “the way of the hand,” which also came from the Tang Dynasty. In Taekwondo you have a lot more foot techniques, like kicking. In Tang Soo Do, you’ll have a lot more balance of both your feet and hands.

Eric: I know this is a conversation about big data, but in pinch, can you kick ass?

Inhi: I would hope so. I don’t want people to test my reflexes, though, but I would hope so. I think the bigger thing is, because when things are ingrained to you when you’re at a young age it’s instinctual. I definitely know how to use various objects as a weapon for example.

Like if I were in a parking lot situation and I felt uncomfortable, the way I would hold my keys or if I had any other objects in my pocketbook, I’d know that. I know where all the key pressure points are, or the most sensitive spots on the body. I know where naturally the bones would break. Things like that definitely give me an advantage.

Eric: I’m scared. Genuinely.

Inhi: I can give you some basics.

Eric: Let’s save that for another interview. And let’s get down to business. Why should nontechnical people care about big data?

Inhi: Part of big data is about operating with a level of insight and intelligence that allows you to either take out latency, or act in a way that you weren’t able to act yesterday. With a higher degree of confidence, for example. For that reason alone, everyone should be engaged.

Eric: Talk to us about categorizing data.

Inhi: Society as a whole is entering what I would consider a data economy. Meaning, the value in which you’re going to be able to differentiate yourself, and your enterprise or your organization is going to be based on that knowledge.

Now, where does that data comes from, it’s going to come not only from inside your organization, i.e. from your traditional repository where you keep transactional data, records, accounting revenue, checks, but it’s actually going to be more data that exists outside your organization.

It could be public data meaning social data. It could be publicly registered data, meaning data that’s published within your respective industry relevant to benchmarks and performance. It could even be data that’s in a downstream industry that you’re related to, but has implications.

For example, when there’s a national weather disaster and it’s hit in another part of the world, it might impact your supply chain or have downstream implications on a particular industry. Knowing that in advance can have a material impact on not only your ability to remediate the situation, but also to think through proactively how to work around it

Eric: What is the likely origin of data?

Inhi: One is transactional data, typically from transactional and/or application sources, like data about your supply chain data, CRM, customer relationship management data sources, or transactional systems where you are doing commerce. That would be more structured typically high volume, there’s not a lot of variety in it.

The second type is what I consider your traditional content management type repository. It could be documents, files. That has a little bit more variety to it, but even then it’s semi‑formatted. Then you’ve got two new sources that I talk about.

One is machine generated data, that could be semi structured as well as unstructured, but machine generated data is really data coming out of assets inherently are not smart, but because they’re instrumented, you know the condition of it, you know the location of it, you know a lot of things about it all of a sudden, and it’s being generated constantly.

The fourth category is what I call social data. It’s not necessarily social media, but it’s data about people, interactions that people have with other people. It’s the sentiment around the data set. It’s the identities, profiles, and entities of individuals.
That becomes another dimension, but if you think about those four types and the sources where they may be generated, it can then give you an idea of, “Oh, how you could leverage all of that to give you a better advice.”

Eric: I would imagine the social data is pretty messy?

Inhi: Social data is the messiest. It’s also what I consider the most incomplete, the most uncertain, but it’s actually less important in terms of any one data point. What you care most about is the aggregate or the metadata.
I’ll give you one example that we shared actually at our Information on Demand conference. I was in Las Vegas two weeks ago.
There was a data point that had been shared that showed like, “If you take the aggregate public crowd data from Twitter around the flu and map out the flu epidemic, you could see a two week lag between what the CDC reported based on time and date, versus what was available through crowdsourcing, Twitter.
Two weekly time you can actually have a material impact on location and society. A single data point may not have actually been valid, but the aggregate data and then the metadata from it is of value.

Eric: So you’re telling me that monitoring social media alone is like looking at the Grand Canyon through a keyhole?

Inhi: It can definitely feel like that. It is only as good as being able to relate it back to a material effect that you want to have for a particular person, or a set of users, an audience. I will give you an example of a client that, I think is trying to be very aggressive on leveraging social and mobile, but then getting really tie it back to the operational system, which was a retail cosmetics firm.

They said “if I had a mobile device or an iPad with the sales rep at the counter selling the cosmetics, I could drive more sales,” because individuals coming in, consumers would fill out the iPad and they’d catch more information. They saw the increase in the number of sales. However, they did not get back to their customer warehouse and their traditional customer database.

As a result, their campaign management system wasn’t linked and when they started to send that same audience direct mail campaigns, as well as electronic campaigns, it was inconsistent. The message was inconsistent.

The consumers were upset, because they said “well, I just spent ten minutes filling out my profile, and you just sent a different set of recommendations, did you give me the wrong recommendations the first time I bought stuff, or are you not linking and taking the most current information I gave you”?

Eric: It seems like, when you remove human intelligence from the equation and you let data trigger decisions, without someone looking over it, this type of risk exists, yes?

Inhi: Yes, that type of risk definitely does exist. I look at these types of capabilities as things that augment human ability to make decisions. We are also working on some new technology that actually allows data to find other data, in relevance to be derived later. Because every data point could have relevance, it is a matter of context and time.

Eric: Talk to us about OODA. What is it and how does it apply to big data?

Inhi: I use it a lot in analogies, actually. It is a military, sort of strategic thinking that was developed by a military leader named John Boyd, during the Korean War. It was for fighter pilots, to develop their OODA loop. Observe, orient, decide, act. If your OODA loop was faster than your opponent’s, it meant life or death.

The expectation was you want your OODA loop to process, whether or not someone was friend or foe within under 40 seconds. Each time you went in battle. your ability to observe, orient, decide and act on whether or not to pull the trigger, or maneuver the plane in a certain way, would improve. Because of fighter prior practice, experience and so forth.

When you think about operating more and more in real time and operating it with greater relevance, what you are asking every organization to do is culturally develop an OODA loop that allows them to observe everything in their environment through multiple senses, than just their traditional way of capturing that data.

You want them to orient with greater context, relevant context, and then you want to give them the decision‑making ability with a high degree of confidence and then the ability to automate and/or take that action immediately, based on that insight.

Eric: How do you apply sound IT governance practices to big data projects?

Inhi: The more advanced companies that are implementing and executing big data projects have greater investments in information and integration in governance: tight technology capabilities, as well as cultural and organizational capabilities.
I will give you an example. The CEO of a major Mexican bank said, who, by the way, is an industry thought leader in the finance sector as a whole, said “in finance, the goal is to go from uncertainty to risk, because risk is manageable”. In a big data context, it is the same thing. You are going from uncertainty to enough governance.

The point is not to necessarily invest in every form of governance, it is enough governance for you to understand the risk, that you can place an even bigger bet. The real objective is, how do you drive confidence in the data and in the decision‑making, and in the security policies that you have in such a way that the risks that you take are assumed and known, and you need to do that intentionally.

Eric: Is this all happening in the background? Does a front‑line employee have to have some training or understanding of what to do to comply with these types of governance standards, or is it something that is done for them, by how the system is set out?

Inhi: It can be a little of both. I would say, typically, to just make it easier, it is what you can implement into your systems and environments. Let me give you a couple of examples. A lot of folks today want to be able to take advantage of new capabilities, like Hadoop, which is a distributive file system that allows you to parallel or process multiple, broad sets of social, as well as unstructured data very quickly and easily.

We have the ability to actually allow you to maintain some of those governance capabilities automatically, which allows you to be more creative in how you use technology. You can instrument the system, in a way that’s seamless and governed, and that’s based on the business policy.

Eric: When you’re pulling in transactional data, machine data, social enterprise data, this vast collection of data from disparate sources, what are the risks that you might wind up infringing on the intellectual property rights of others?

Inhi: The real infringement comes in if someone gives you data and gives you permission to use data for a particular way, but then you end up using it in a different way. That’s where some of the privacy concerns and entitlement concerns come into play.

Often, though, goes back to an earlier point I made, which is sometimes it’s less about any one data point, in terms of the raw data, but it’s really about the aggregate metadata that can actually drive the insight.
Most organizations I’ve interacted with want to be able to benchmark the best data that they have with a pool of metadata, that is provided by their industry peer set versus any one data point.

Eric: There’s no risk that if I’m aggregating social data, and I’m aggregating data around key words or around key phrases, and the data lands me onto either photos or video or music or a diagram or a design or a trade secret that’s the intellectual property of someone — perhaps it was shared by someone inadvertently, they shouldn’t have done that but they didn’t know and they did, and then it winds up in my database. Am I an infringer?

Inhi: I think that does have implications. It also has to do with intent. I think we’re entering a time period where the jurisdictions don’t yet serve, let’s say, the reality of the way people are interacting and operating, I’m sure that the policy regulations around it are still emerging. It’s not always an easy question. I think what you want to do is tie to specific intent, permission.

The way I tell clients to get around this, actually, the best advice is go ask for those directly if it’s a specific data point. Go back and ask them for permission.
For example, don’t just crawl the Web, you can crawl it, and you generate the aggregate insight, but if there’s, let’s say, an access that you want, i.e. access to a user ID, verification of someone’s identity, go ask them, and ask their permission. If they say yes, then you’re absolved of that risk. If they say no, then you know you shouldn’t use it.

Eric: We talked a lot about harnessing big data to deliver improved business outcomes. I want to spend a little bit of time on political and social outcomes as well.

I want to read you a quote from Brazilian president Dilma Rousseff, which was after she found out that her calls were being monitored by the NSA, and the quote is, “In the absence of the right to privacy, there can be no true freedom of expression and opinion, and therefore no effective democracy.”

The question is, in your opinion, does freedom of expression depend on the right to privacy?

Inhi: I don’t think it’s a binary answer. I think with any liberty, there is an expectation of valuing what that means, and an expectation of how society should operate, but how individuals should operate both in terms what they feel like they’re entitled to. I do think that we do have to be incredibly sensitive about people’s ability to express their thoughts freely.

That’s a core aspect of not just this particular country and nation, but it’s an important aspect of how creativity is generated, and new ideas are generated.

Eric: The leaks from Edward Snowden, which revealed the NSA’s boundless attempts to cross‑correlate personally identifiable information with other datasets and pursuit of national security portrays the US government as obsessed with surveillance.

The question is, if James Clapper, the director of intelligence at the CIA, were to say to you, “Inhi, I’ll give you anything you want. World peace, end of world hunger, anything you want, if you can make the PRISM program constitutional.” What would you tell him?

Inhi: I don’t know. That is one question I never thought of.

Eric: If James Clapper said to you, “Hey, what can we do to make PRISM fair because my agency needs the data, and we’re going to get shut down if we don’t make it fair, so what can you do to make it fair?”

Inhi: Here’s the mandate for the technology industry. We have enough people that are creative enough to this primarily by design. Software can be built to protect individuals. And designed not just to protect identities, but injustices to civil liberties as well.

This is something that IBM is looking at and investing in, especially around the new technology base called G2, which is to be able to design privacy as a core element.

Follow Inhi at http://twitter.com/Inhicho