The Current State of Machine Intelligence

(the 2016 update to this post can be found at: https://medium.com/@shivon/the-current-state-of-machine-intelligence-2-0-a9e0bab95511#.ffmh59x3o)

I spent the last three months learning about every artificial intelligence, machine learning, or data related startup I could find — my current list has 2,529 of them to be exact. Yes, I should find better things to do with my evenings and weekends but until then…

Why do this?

A few years ago, investors and startups were chasing “big data” (I helped put together a landscape on that industry). Now we’re seeing a similar explosion of companies calling themselves artificial intelligence, machine learning, or somesuch — collectively I call these “machine intelligence” (I’ll get into the definitions in a second). Our fund, Bloomberg Beta, which is focused on the future of work, has been investing in these approaches. I created this landscape to start to put startups into context. I’m a thesis-oriented investor and it’s much easier to identify crowded areas and see white space once the landscape has some sort of taxonomy.

What is “machine intelligence,” anyway?

I mean “machine intelligence” as a unifying term for what others call machine learning and artificial intelligence. (Some others have used the term before, without quite describing it or understanding how laden this field has been with debates over descriptions.) I would have preferred to avoid a different label but when I tried either “artificial intelligence” or “machine learning” both proved to too narrow: when I called it “artificial intelligence” too many people were distracted by whether certain companies were “true AI,” and when I called it “machine learning,” many thought I wasn’t doing justice to the more “AI-esque” like the various flavors of deep learning. People have immediately grasped “machine intelligence” so here we are. ☺

Computers are learning to think, read, and write. They’re also picking up human sensory function, with the ability to see and hear (arguably to touch, taste, and smell, though those have been of a lesser focus). Machine intelligence technologies cut across a vast array of problem types (from classification and clustering to natural language processing and computer vision) and methods (from support vector machines to deep belief networks). All of these technologies are reflected on this landscape.

What this landscape doesn’t include, however important, is “big data” technologies. Some have used this term interchangeably with machine learning and artificial intelligence, but I want to focus on the intelligence methods rather than data, storage, and computation pieces of the puzzle for this landscape (though of course data technologies enable machine intelligence).

Which companies are on the landscape?

I considered thousands of companies, so while the chart is crowded it’s still a small subset of the overall ecosystem. “Admissions rates” to the chart were fairly in line with those of Yale or Harvard, and perhaps equally arbitrary. ☺

I tried to pick companies that used machine intelligence methods as a defining part of their technology. Many of these companies clearly belong in multiple areas but for the sake of simplicity I tried to keep companies in their primary area and categorized them by the language they use to describe themselves (instead of quibbling over whether a company used “NLP” accurately in its self-description).

If you want to get a sense for innovations at the heart of machine intelligence, focus on the core technologies layer. Some of these companies have APIs that power other applications, some sell their platforms directly into enterprise, some are at the stage of cryptic demos, and some are so stealthy that all we have is a few sentences to describe them.

The most exciting part for me was seeing how much is happening in the application space. These companies separated nicely into those that reinvent the enterprise, industries, and ourselves.

If I were looking to build a company right now, I’d use this landscape to help figure out what core and supporting technologies I could package into a novel industry application. Everyone likes solving the sexy problems but there are an incredible amount of ‘unsexy’ industry use cases that have massive market opportunities and powerful enabling technologies that are begging to be used for creative applications (e.g., Watson Developer Cloud, AlchemyAPI).

Reflections on the landscape:

We’ve seen a few great articles recently outlining why machine intelligence is experiencing a resurgence, documenting the enabling factors of this resurgence. (Kevin Kelly, for example chalks it up to cheap parallel computing, large datasets, and better algorithms.) I focused on understanding the ecosystem on a company-by-company level and drawing implications from that.

Yes, it’s true, machine intelligence is transforming the enterprise, industries and humans alike.

On a high level it’s easy to understand why machine intelligence is important, but it wasn’t until I laid out what many of these companies are actually doing that I started to grok how much it is already transforming everything around us. As Kevin Kelly more provocatively put it, “the business plans of the next 10,000 startups are easy to forecast: Take X and add AI”. In many cases you don’t even need the X — machine intelligence will certainly transform existing industries, but will also likely create entirely new ones.

Machine intelligence is enabling applications we already expect like automated assistants (Siri), adorable robots (Jibo), and identifying people in images (like the highly effective but unfortunately named DeepFace). However, it’s also doing the unexpected: protecting children from sex trafficking, reducing the chemical content in the lettuce we eat, helping us buy shoes online that fit our feet precisely, and destroying 80’s classic video games.

Many companies will be acquired.

I was surprised to find that over 10% of the eligible (non-public) companies on the slide have been acquired. It was in stark contrast to big data landscape we created, which had very few acquisitions at the time.

No jaw will drop when I reveal that Google is the number one acquirer, though there were more than 15 different acquirers just for the companies on this chart. My guess is that by the end of 2015 almost another 10% will be acquired. For thoughts on which specific ones will get snapped up in the next year you’ll have to twist my arm…

Big companies have a disproportionate advantage, especially those that build consumer products.

The giants in search (Google, Baidu), social networks (Facebook, LinkedIn, Pinterest), content (Netflix, Yahoo!), mobile (Apple) and e-commerce (Amazon) are in an incredible position. They have massive datasets and constant consumer interactions that enable tight feedback loops for their algorithms (and these factors combine to create powerful network effects) — and they have the most to gain from the low hanging fruit that machine intelligence bears.

Best-in-class personalization and recommendation algorithms have enabled these companies’ success (it’s both impressive and disconcerting that Facebook recommends you add the person you had a crush on in college and Netflix tees up that perfect guilty pleasure sitcom). Now they are all competing in a new battlefield: the move to mobile. Winning mobile will require lots of machine intelligence: state of the art natural language interfaces (like Apple’s Siri), visual search (like Amazon’s “FireFly”), and dynamic question answering technology that tells you the answer instead of providing a menu of links (all of the search companies are wrestling with this).Large enterprise companies (IBM and Microsoft) have also made incredible strides in the field, though they don’t have the same human-facing requirements so are focusing their attention more on knowledge representation tasks on large industry datasets, like IBM Watson’s application to assist doctors with diagnoses.

The talent’s in the New (AI)vy League.

In the last 20 years, most of the best minds in machine intelligence (especially the ‘hardcore AI’ types) worked in academia. They developed new machine intelligence methods, but there were few real world applications that could drive business value.

Now that real world applications of more complex machine intelligence methods like deep belief nets and hierarchical neural networks are starting to solve real world problems, we’re seeing academic talent move to corporate settings. Facebook recruited NYU professors Yann LeCun and Rob Fergus to their AI Lab, Google hired University of Toronto’s Geoffrey Hinton, Baidu wooed Andrew Ng. It’s important to note that they all still give back significantly to the academic community (one of LeCun’s lab mandates is to work on core research to give back to the community, Hinton spends half of his time teaching, Ng has made machine intelligence more accessible through Coursera) but it is clear that a lot of the intellectual horsepower is moving away from academia.

For aspiring minds in the space, these corporate labs not only offer lucrative salaries and access to the “godfathers” of the industry, but, the most important ingredient: data. These labs offer talent access to datasets they could never get otherwise (the ImageNet dataset is fantastic, but can’t compare to what Facebook, Google, and Baidu have in house). As a result, we’ll likely see corporations become the home of many of the most important innovations in machine intelligence and recruit many of the graduate students and postdocs that would have otherwise stayed in academia.

There will be a peace dividend.

Big companies have an inherent advantage and it’s likely that the ones who will win the machine intelligence race will be even more powerful than they are today. However, the good news for the rest of the world is that the core technology they develop will rapidly spill into other areas, both via departing talent and published research.

Similar to the big data revolution, which was sparked by the release of Google’s BigTable and BigQuery papers, we will see corporations release equally groundbreaking new technologies into the community. Those innovations will be adapted to new industries and use cases that the Googles of the world don’t have the DNA or desire to tackle.

Opportunities for entrepreneurs:

“My company does deep learning for X”

Few words will make you more popular in 2015. That is, if you can credibly say them.

Deep learning is a particularly popular method in the machine intelligence field that has been getting a lot of attention. Google, Facebook, and Baidu have achieved excellent results with the method for vision and language based tasks and startups like Enlitic have shown promising results as well.

Yes, it will be an overused buzzword with excitement ahead of results and business models, but unlike the hundreds of companies that say they do “big data”, it’s much easier to cut to the chase in terms of verifying credibility here if you’re paying attention.

The most exciting part about the deep learning method is that when applied with the appropriate levels of care and feeding, it can replace some of the intuition that comes from domain expertise with automatically-learned features. The hope is that, in many cases, it will allow us to fundamentally rethink what a best-in-class solution is.

As an investor who is curious about the quirkier applications of data and machine intelligence, I can’t wait to see what creative problems deep learning practitioners try to solve. I completely agree with Jeff Hawkins when he says a lot of the killer applications of these types of technologies will sneak up on us. I fully intend to keep an open mind.

“Acquihire as a business model”

People say that data scientists are unicorns in short supply. The talent crunch in machine intelligence will make it look like we had a glut of data scientists. In the data field, many people had industry experience over the past decade. Most hardcore machine intelligence work has only been in academia. We won’t be able to grow this talent overnight.

This shortage of talent is a boon for founders who actually understand machine intelligence. A lot of companies in the space will get seed funding because there are early signs that the acquihire price for a machine intelligence expert is north of 5x that of a normal technical acquihire (take, for example Deep Mind, where price per technical head was somewhere between $5–10M, if we choose to consider it in the acquihire category). I’ve had multiple friends ask me, only semi-jokingly, “Shivon, should I just round up all of my smartest friends in the AI world and call it a company?” To be honest, I’m not sure what to tell them. (At Bloomberg Beta, we’d rather back companies building for the long term, but that doesn’t mean this won’t be a lucrative strategy for many enterprising founders.)

A good demo is disproportionately valuable in machine intelligence

I remember watching Watson play Jeopardy. When it struggled at the beginning I felt really sad for it. When it started trouncing its competitors I remember cheering it on as if it were the Toronto Maple Leafs in the Stanley Cup finals (disclaimers: (1) I was an IBMer at the time so was biased towards my team (2) the Maple Leafs have not made the finals during my lifetime — yet — so that was purely a hypothetical).

Why do these awe-inspiring demos matter? The last wave of technology companies to IPO didn’t have demos that most of us would watch, so why should machine intelligence companies? The last wave of companies were very computer-like: database companies, enterprise applications, and the like. Sure, I’d like to see a 10x more performant database, but most people wouldn’t care. Machine intelligence wins and loses on demos because 1) the technology is very human, enough to inspire shock and awe, 2) business models tend to take a while to form, so they need more funding for longer period of time to get them there, 3) they are fantastic acquisition bait.

Watson beat the world’s best humans at trivia, even if it thought Toronto was a US city. DeepMind blew people away by beating video games. Vicarious took on CAPTCHA. There are a few companies still in stealth that promise to impress beyond that, and I can’t wait to see if they get there.


Demo or not, I’d love to talk to anyone using machine intelligence to change the world. There’s no industry too unsexy, no problem too geeky. I’d love to be there to help so don’t be shy.

I hope this landscape chart sparks a conversation. The goal to is make this a living document and I want to know if there are companies or categories missing. I welcome feedback and would like to put together a dynamic visualization where I can add more companies and dimensions to the data (methods used, data types, end users, investment to date, location, etc.) so that folks can interact with it to better explore the space.


Deloitte’s top 8 business technology trends of 2015

‘Report also includes six exponential technologies, including artificial intelligence, robotics and additive manufacturing’

 

Across industries and geographies, business strategy is being transformed by the rapidly changing technology landscape, according to Deloitte’s sixth annual tech trends report, released today.

As technology and business become ever more intertwined, the report outlines the top macro technology developments that will disrupt businesses in the next 18 to 24 months.

This year’s theme is inspired by a fundamental transformation in the way business C-suite leaders and CIOs collaborate to harness disruptive change, chart business strategy and pursue opportunities.

>See also: The 2015 cyber security roadmap

1. Ambient computing

As the Internet of Things (IoT) is maturing from its awkward adolescent phase, ‘ambient computing’ is the backdrop of sensors, devices, intelligence and agents that can put the concept to work. For example, getting a vending machine to trigger an order replenishment from the supply chain, through embedded sensors tracking stock levels. Practical applications in UK industries include those within healthcare and life sciences (remote monitoring patient care in the community), within transport (accident response sensors on the roadside), and within energy (smart metering in homes).

2. IT worker of the future

The scarcity of technical talent on multiple fronts is a significant concern across many industries. Having access to the right number of people, with the exact skills, who can cope with the very latest innovations, whilst maintaining legacy systems, will require business leaders to access a new type of employee. The IT worker of the future will have habits, incentives and skills that are inherently different from those in play today. Design lies at the heart of this new generation, and requires new skill sets such as graphic designers, user experience engineers and behavioural psychologists. IT leaders should add an “A” for fine arts to the science, technology, engineering, and math charter. Aiming for “STEAM”, not just “STEM”, skills.

3. CIO as chief integration officer

In a world that is being reconfigured by technology, CIOs should be at the centre of this change. As technology transforms existing business models and gives rise to new ones; the role of the CIO is evolving rapidly, with integration at the core of its responsibility. CIOs must view their responsibilities through an enterprise-wide lens to ensure that critical domains like digital, analytics and the cloud do not negatively affect other departments. In this shifting landscape of opportunities and challenges, CIOs can be not only the connective tissue, but the driving force for intersecting, IT-heavy initiatives.

4. Dimensional marketing

Marketing has evolved significantly in the last half-decade. The evolution of digitally-connected customers lies at the core, reflecting the dramatic change in the dynamic between relationships and transactions. A new vision for marketing is being formed as Chief Marketing Officers (‘CMOs’) and CIOs invest in technology for marketing automation, next-generation omnichannel approaches, content development, customer analytics, and commerce initiatives. This modern era for marketing is likely to bring new challenges in the dimensions of customer engagement, connectivity, data and insight.

>See also: 4 data science predictions for 2015

5. Amplified intelligence

Analytic techniques are growing in complexity and companies are looking to new areas such as machine learning and predictive modelling to help process increasingly large and complex data sets. Artificial intelligence is now a reality. Its more promising application, however, is not replacing workers – but for amplifying their abilities. Amplified intelligence is focused on deploying tools at points when a business really needs it for effective decision-making. Examples include techniques like natural language processing (allowing conversational interaction with a complex system), visualisation tools (letting individuals explore data on their own terms and find new patterns of discoveries), or advanced analytics mobile solutions (such as those embedded inside smartphones or tablets).

6.  API economy

Application programming interfaces (APIs) have been elevated from a development technique to a business model driver and a boardroom consideration. An organisation’s core assets can be reused, shared and monetised through APIs. The focus is shifting from internal APIs between applications to public APIs, which have doubled in the past 18 months.

A number of industries have embraced APIs, including telecommunications and media, finance, travel, tourism and real estate. The Internet of Things is also contributing to this trend, as the number IoT applications across a number of industries expand at a rapid pace. APIs should be managed like a product – one built on top of a potentially complex technical footprint that includes legacy and third-party systems and data.

7. Software-defined everything

Amid the excitement surrounding digital, analytics and cloud, it is easy to overlook advances currently being made in infrastructure and operations. The entire operating environment – server, storage, and network – can now be virtualised and automated. The data centre of the future represents the potential for not only lowering costs, but also dramatically improving speeds and reducing the complexity of deploying and maintaining technology footprints. Software-defined everything can elevate infrastructure investments, from costly plumbing to competitive differentiators.

8. Core renaissance

Organisations have significant investments in their core systems, both built and bought. Beyond running the heart of the business, these assets can form the foundation for growth and new service development – building upon standardised data and automated business processes. To this end, many organisations are modernising systems to pay down technical debt, replatforming solutions to remove barriers to scale and performance, and extending their legacy infrastructures to fuel innovative new services and offerings

>See also: Gartner identifies the top 10 strategic technology trends for 2015

The report also includes a section dedicated to six exponential technologies – innovative disciplines evolving faster than the pace of Moore’s Law – whose impact may be profound. These include: artificial intelligence, robotics, additive manufacturing, quantum computing, industrial biology and cyber security.

– See more at: http://www.information-age.com/it-management/strategy-and-innovation/123459151/deloittes-top-8-business-technology-trends-2015#sthash.Xb4nbDr3.dpuf

Is security really stuck in the Dark Ages?

By

Amit Yoran’s colleagues didn’t agree with everything the RSA President said at his keynote last month. But most say he got the essentials right – things are bad and getting worse, and the industry needs a new mindset
It had to be a bit of a jolt for more than 500 exhibitors and thousands of attendees at RSA Conference 2015 last month, all pushing, promoting and inspecting the latest and greatest in digital security technology: The theme of RSA President Amit Yoran’s opening keynote was that they are all stuck in the Dark Ages.
To make the point “visually,” Yoran even spent his first minute or so on stage speaking in pitch darkness, “stumbling around,” backed by the sound of an ominous, moaning wind.

This, he insisted, was an apt metaphor, “for anyone trying to protect and defend a digital infrastructure today. Every alert that pops up is like a bump in the night,” he said. “Often we don’t have enough context to realize which ones really matter and which ones we can ignore.”

It is easy to make the case statistically. The Identity Theft Resource Center reported in January that there were 738 data breaches in 2014, up 25% from the prior year.

Or, as Yoran put it, 2014 was, “yet another year of the breach. Or, have we agreed to call it the year of the mega breach? That might connote that things are getting worse, not better,” he said, adding sardonically that 2015 is likely to become, “the year of the super-mega breach. At this pace we are soon going to run out of adjectives.”

That, he contended, is because the defensive mindset of Internet security today is “fundamentally broken … (and) very much mimics the Dark Ages. We’re simply building taller castle walls and digging deeper moats.”

All of which may have sounded a bit insulting to hundreds of vendors and experts who have been saying for years that “the perimeter is dead.” Or, that, “it’s not a question of if you’ve been breached, but when.” Or, that intruders are quite likely inside your organization right now, and that a stronger perimeter will do nothing to eliminate them.

Indeed, many of them were there promoting solutions to detect and respond to insider threats.
But Yoran insisted that the rhetoric is not matched by actions. “We say we know the perimeter is dead, we say we know the adversary is on the inside, but we aren’t changing how we operate,” he said.

In an email interview this week, Yoran acknowledged that the industry is beginning to move in the direction of monitoring and response, but said, “today’s reality” is that, “by every measure, a vast, supermajority of security expenditures focus on prevention.”

Citing his military training at West Point, he said in his keynote that the security industry is trying to use “maps” that no longer apply to the current threat landscape.

The result, he said, is that attackers, “are winning by every possible measure.”

Really?

His colleagues in the industry may not agree with all of that, but most think he got the essentials right. John Pirc, chief strategy officer at Bricata, said he “totally agrees” that the perimeter mindset is still too prevalent. “Security needs to move deeper within the network. The need is for visibility in the data center rather than on premise or the cloud,” he said.

Anton Chuvakin, research director, security and risk management at Gartner for Technical Professionals, is another. “Sadly, he is mostly correct regarding many companies that are still in the ‘prevent the attack,’ or ‘don’t let them in’ mentality,” he said, even though the, “more mature and enlightened have known for years, if not decades, that the attackers will occasionally break in and that you will need to be prepared.”
Yes, the PC endpoint is lost indeed. But strangely enough, a mobile endpoint is a bright area – despite all the whining about Android malware, iOS and Android are relatively unscathed.
said Anton Chuvakin, research director, security and risk management, Gartner for Technical Professionals
Chuvakin said virtually every security pro has been, “taught the prevention/detection/response mantra, but at many places the spend is mostly on prevention, and preventative technology gets the attention.”

Muddu Sudhakar, CEO of Caspida, said he agrees that adversaries are winning, noting that, “the FBI Cyber Division head commented last week that while they used to learn about a large-scale breach every two to three weeks, it is now every two to three days.”

But he said context is important. “The bad guys only have to succeed once, while defending data has to succeed 100% of the time,” he said.

Rob Kraus, director of security research and strategy at Solutionary, also said context matters. He said simply declaring that the “good guys” are losing neglects the ebb and flow of the battle.

Lost in the clouds: Your private data has been indexed by Google
“As advances are made by the good guys, the enemy will re-evaluate and re-deploy capabilities in a way that can circumvent their attack or defensive postures. The challenge with the cyberworld focus is that the battle moves much more quickly, and is even more multi-dimensional.”

But he agrees with Yoran that there is still too much reliance on defending perimeters. “Many organizations are still locked into the concept that the castle walls will protect the bad guys from getting in,” he said. “Most are not thinking about those who climbed over or tunneled under those walls.

“It could be much worse than Amit describes, but it could also be much better,” he said.

He said breaches, while they are an increasing fact of life, are no longer the most important challenge for the industry. “Hacking data alone isn’t getting a huge response from the public,” he said. “The next level we are moving to is real cyber warfare or cyber terrorism.”
And Gary McGraw, CTO of Cigital, said Yoran was “stating the obvious” when he said the adversaries are winning, but was missing the more important point – that too many systems don’t even have a good perimeter to defend. “Perimeter security only works if you have a perimeter,” he said, “and that starts with building things that don’t suck. He’s got the cart before the horse, and the cart is in a different state.”

You can spend your time with a whole army tracking termites, or you can change your building material from wood to steel.

In his keynote, Yoran said a major reason the security industry needs a new “map” is because, “we can neither secure nor trust the pervasive, complex, and diverse endpoint participants in any large and distributed computing environment, let alone the transports and protocols through which they interact.”

His colleagues say that while they agree endpoint protection is a problem, they think a blanket statement like that is overly broad.

“Yes, the PC endpoint is lost indeed,” Chuvakin said “But strangely enough, a mobile endpoint is a bright area – despite all the whining about Android malware, iOS and Android are relatively unscathed.”
And Gula said it doesn’t apply to all business sectors. “Manufacturer of ATMs who run their own network, write their own code, etc., would completely disagree,” he said. “ISPs that carry their customer’s data would disagree as well.”

There were also mixed views on Yoran’s five recommendations (see sidebar) for the industry to “reprogram itself for success.” Two of them are to, “stop believing that advanced protections work,” and to, “adopt a deep and pervasive level of visibility everywhere, from the endpoint, to the network to the cloud – what SIEM (Security Information and Event Management) isn’t, but was meant to be.”

Reprogramming security for success
Amit Yoran gave these five recommendations for the future of the security industry.

Stop believing that advanced protections work. While they do have value, they will fail some of the time.
Adopt a deep and pervasive level of visibility everywhere, from the endpoint, to the network to the cloud – what SIEM isn’t, but was meant to be.
In a world with no perimeters, and fewer anchor points, authentication and identity matter more, not less, since most attacks use stolen credentials, not malware.
Leverage external threat intelligence with machine-readable format for increased speed and agility to respond and identify those threats that might matter most to the organization.
Categorize and prioritize assets: Understand what is really mission critical to your organization.
mobile survival

Chuvakin said that just because something is not 100% effective doesn’t mean it doesn’t work.

“Try this for size,” he said. “A bulletproof vest does not work, since you can be shot in the head or burned or shot with an armor piercing bullet. Nobody thinks like that.”
But he and others agree with the need for more visibility. Pirc said that, “what you can’t see will in fact hurt you in the long run,” he said. “That’s why you need visibility throughout your entire infrastructure.”

Sudhakar notes, however, that saying visibility and achieving it are two different things. “A big part of the problem is that while we have a handle on known threats, we do not have a good handle on unknown or hidden threats,” he said.

And McGraw said visibility, while a good thing, doesn’t matter that much if systems lack security by design. “You should do that, but build good stuff first,” he said, likening it to tracking termites in a house built of wood. “You can spend your time with a whole army tracking termites, or you can change your building material from wood to steel,” he said.

But, he said, “the good news is that RSA already has a robust software security approach. It’s being run by Eric Baize, and he’s doing a great job.”
Gula and others say the industry is moving in the right direction, through compliance with regulatory regimes like SOX (Sarbanes-Oxley Act) and PCI DSS (Payment Card Industry Data Security Standard) that, “require least use of privilege, no admin accounts, etc. – these are directed against insiders. Also, there is a move by many organizations with cloud assets to have centralized authentication, such as single sign-on, which is also a large deterrent and form of detection of insiders,” he said.

But they also offered a few additional suggestions for what Yoran said should be the goal – a new “Age of Enlightenment” in security.

Chuvakin said that good visibility should be supported by, “effective security incident planning.”

According to Sudhakar, organizations should be using, “behavioral analytics and machine learning to uncover hidden threats and vulnerabilities.”

He added that since IT security people are hard to find and retain, organizations should, “automate to the maximum degree possible so that you can do more with less. Automation can also change the internal dynamic, as IT security staff can become threat hunters instead of being the hunted.”

Kraus also said planning is important. In war, he said, “does the U.S. simply give soldiers guns and point them to the battlefield? Or, is it more likely that they train their soldiers and appoint leaders to drive the battle to a successful outcome?”

Overall, as tough as the message was, it was welcome. Yoran said this week that while he had been uncertain about what the response to his keynote would be, “I was actually a bit surprised by seemingly unanimous support from colleagues and even competitors. Many people have come up to me or tweeted since that I said what needed to be said, and that they hoped that the speech served as a catalyst for necessary and significant change in the industry’s mindset.”

The origins of ‘Bad Bots’: Which countries to worry about

This is a contributed piece from Rami Essaid, Co-Founder and CEO of Distil Networks

Almost 60% of today’s internet traffic is non-human, up more than 30% in the past year alone, and more than a third of that non-human bot traffic is malicious. The Distil Networks 2015 Bad Bot Landscape Report revealed that certain countries and service providers have become productive host environments for bot generators.

Bots are the key culprits behind web scraping, brute force attacks, competitive data mining, brownouts, account hijacking, unauthorized vulnerability scans, spam, man-in-the-middle attacks, and more. They place a huge tax on IT security and web infrastructure teams, and their variety, volume and sophistication wreak havoc across online operations big and small.

Mobile bots arrive in drives, beware of China

For the first time, the Android Webkit Browser appeared on the top five list of user agents leveraged by bad bots to hide their non-human identity at 4.87%. Mobile sites tend to be easier to scrape because they provide bots with more structured access to data.

China leads the world in bad bot mobile traffic at 30.64%, and the three mobile carriers with the highest percentage of bad bot traffic are all based in China. On this side of the world, for the first time we’re seeing a US mobile carrier (T-Mobile USA) appear in the top 20 list of ISPs serving bad bot traffic — 19.7% of the traffic was by bots.

1

Tracking “Bad Bot GDP”

While the United States is the source of more than 50% of bad bots with its thousands of low-cost hosting providers, absolute numbers can be misleading. Measuring the number of bad bots per online user provides another view into country-specific traffic risk; this is the number we’ve dubbed the Bad Bot GDP of a country.

As an example, the Maldives served up almost 16 bad bots per internet user in 2014. Of course, Bad Bot GDP numbers spike faster with smaller populations. In the summer of 2014, notorious Russian hacker Roman Seleznev was arrested in the Maldives and extradited to the US for allegedly stealing millions of dollars’ worth of credit card information.

2

Closer examination of the top three Bad Bot GDP countries helps justify their rankings:

  • Singapore tops the list with a Bad Bot GDP of 152.87 bad bots per online user. According to the 2014 Global Information Technology Report by the World Economic Forum, Singapore has the best network-ready environment in Asia, so it’s a popular choice as a data center hub for China and Southeast Asia. Its small population relative to its well developed infrastructure boosts its Bad Bot GDP.
  • Israel, with a Bad Bot GDP of 34.12, has a similarly small population and the most complete data center and internet infrastructure in the Middle East. A 2013 US National Intelligence Estimate on cyber threats ranked Israel the third most aggressive intelligence service against the U.S., behind only Russia and China.
  • Slovenia, at 29.69, found itself in a similar situation to the Maldives. Slovenia was the site of a high-profile hacker arrest when Matjaz Skorjanc, the developer of the Mariposa botnet, was arrested there after his malware hijacked more than 12 million computers around the world.

China and Russia most blocked countries

3

For 2014, China and Russia were the most blocked countries. Geo-IP Fencing is an effective website security tactic for those organizations with well-defined geographical markets.

A more widely dispersed bad bot landscape

Bad bot threats have taken on less predictable patterns. Bots are now attacking from a more broadly dispersed set of global points of origin. 14 countries, almost double the number in 2013, originated at least 1% of bad bot traffic volume in 2014.

4

In 2013, the hour-by-hour bad bot data made it look like the attackers were waiting for IT personnel to leave work before launching their attacks. Not so for 2014, as attacks were much more evenly spread throughout the day.

5

Human or not? The bot dilemma

Some bad bots make little or no attempt to hide their identities, which makes them easy to spot using basic IP blocking or user agent integrity checks. Identifying bad bots becomes much more challenging when they mimic human behavior – which, at 41% of all bad bots tracked in 2014, is alarmingly high.

6

Mitigating bad bots in 2015 and beyond

The bad bot landscape is continuing to evolve rapidly with the dramatic growth in mobile bot traffic, increasingly sophisticated obfuscation techniques, and an expanding range of geographic and ISP points of origin. This is a clear challenge to IT security and web infrastructure teams under increasing pressure to forecast infrastructure demands and protect their online data. Without insight into bad traffic, the challenge is exacerbated

Most hated internet innovations of all time: Pop-ups, spams, captchas, cookies and more

7 most hated internet innovations
A man types on a computer keyboard(Reuters)

While they are noteworthy inventions from technology geniuses, pop-ups, viruses, spams and captchas are widely considered as headaches for the internet community.

Despite the fact that some of the innovations are used by marketers to generate more revenue, they create a lot of frustration for net surfers and have drawn customers’ wrath.

Given below is an infographic produced by NeoMam Studios, detailing the seven most hated internet innovations.

Pop-up ads, invented by Ethan Zuckerman, tops the list with 70% of internet people finding it the most annoying type of advertisements online.

The next on the list is virus. Cybercrimes using viruses are estimated to have cost the global economy about $400bn ($629bn) in 2014.

Another annoying innovation is Captcha — a method by which websites ensure that their users are really human. It is estimated that we are spending 500,000 hours a day on the internet to prove that we are really human.

The other innovations on the list include regional censorship, cookies, spams and cybersquatting — the practice of buying a domain name in bad faith of another person’s trademark.

The 7 most hated internet innovations of all time

Podec is 1st trojan to trick Captcha into thinking it’s human

Podec is 1st trojan to trick Captcha into thinking it's human

From Asia One.

SINGAPORE – Kaspersky Lab has said it has discovered the first malware that can outwit the Captcha image recognition system into thinking that it is a human, so that it can subscribe a person’s infected smartphone to premium-rate services.

The Trojan-SMS.AndroidOS.Podec reportedly forwards Captcha requests to real-time online human translation service Antigate.com, which converts Captcha images to text.

Kaspersky also said that the trojan can bypass the “Advice on Charge” system that informs users about the price of a service and requires authorisation before payment.

So, the trojan is signing up users of infected phone to costly services without their knowledge, bypassing systems designed to verify the subscription.

According to Kaspersky, Podec targets Android devices.

The trojan is being spread primarily through Russia’s popular social network VKontakte (vk.com) and some Russian websites. Most victims have been detected in Russia and surrounding countries to date, Kaspersky said. Infection generally occurs through links to supposedly cracked versions of popular computer games, such as Minecraft Pocket Edition.

It also said that the trojan is still being worked on and the code is being changed to add new capabilties.

– See more at: http://digital.asiaone.com/digital/news/podec-1st-trojan-trick-captcha-thinking-its-human#sthash.8u1wmIRG.dpuf

Identity Fraud Rises; 61 Percent of Breaches Caused by Stolen Credentials

Last year, 13.1 million consumers suffered from identity fraud; the second highest number on record according to Javelin Strategy & Research’s 2014 Identity Fraud Report: Card Data Breaches and Inadequate Consumer Password Habits Fuel Disturbing Fraud Trends.

One of the trends includes an increase in existing card account fraud and losses. Existing card accounts refer to both account numbers and/or the actual cards for existing credit and card-linked debit accounts. Losses due to existing account fraud grew 45% to $16 billion, accounting for 88% of all U.S. fraud losses.

Online Consumer Data At Risk

According to the report, increasing online availability of consumer account information has made existing account fraud more attractive to criminals due to quicker and cheaper prospecting.

And how do criminals access consumer’s online accounts? By leveraging poor password security. One of the major factors in identity theft is the ability to use a single stolen password to access multiple accounts that store or transmit them, according to the report.

KrebsonSecurity.com reports on a healthcare vendor exploit that included the breach of a third-party payroll and HR management provider, Ultimate Software (UltiPro Services). Criminals used stolen credentials to collect patient data from health systems and other healthcare organizations in order to submit fraudulent tax refund requests. They were able to do so by stealing employee W-2s from the HR and payroll departments that used the software. Find out more about the exploit in Lax Healthcare Vendor Security Leads to Data Breaches & Tax Fraud.

And according to a different report by Javelin Strategy & Research, The Consumer Data Insecurity Report (PDF), defrauded data breach victims overwhelmingly (61%) attribute their fraud to the breach of their credentials. These findings strengthen the need for greater security around endpoint access with stronger authentication.

The National Criminal Justice Reference Service reported that government documents/benefits fraud was the most common form of reported identity theft (34%), followed by credit card fraud (17%) and phone/utilities fraud (14%).

According to the NCJRS, the majority of identity theft incidents (85%) involved the fraudulent use of existing account information, such as credit card or bank account information.

The Consumer Affect

An interesting personal account of identity theft contributed to Forbes.com Personal Finance details her discovery of ongoing theft and the aftermath. While retail organizations, hospitals, and banks suffer, so do consumers – and they’re less likely to be loyal to companies that leak their information.

Javelin Research & Strategy reports that breaches affect consumer confidence in a big way – six in 10 victims whose information was compromised in a retailer breach said their level of trust in the retailer declined significantly. Another one in five victims avoid doing business with organizations after his or her information is breached.

And retail organizations are the source of most breached data (50%), as can be seen below, with credit card issuers (22%), primary financial institutions (16%) and healthcare providers (14%).

Retail Breach Graph

Additionally, 64 percent of identity fraud victims think that they should be able to take legal action against the organization that leaked their personal information, which often happens in civil and class-action lawsuits. Nearly 60 lawsuits were filed by affected customers after the Target breach, while dozens of lawsuits brought by banks and credit unions were also filed, asking Target to pay for fraud and card replacement fees.

Another surprising fact is that victims often don’t even know where their information was compromised, nearly half at 49 percent. While there should be data breach regulations for every type of industry and in every state, it varies greatly, and in a few states, they don’t even require organizations to report breaches. As I wrote about in California Breaches Increase 30 Percent in 2014; 84 Percent Retail, 47 states have breach notification laws requiring both private companies and states to notify consumers if they’ve been breached, while three have no security breach laws – including Alabama, New Mexico and South Dakota.

With identity theft on the rise and password theft the main cause, consumers and businesses alike should focus on strengthening their authentication security to avoid becoming a statistic. Find out more about securing against modern risks in the retail industry in our new eBook, A Modern Guide to Retail Data Risks: Avoiding Catastrophic Data Breaches in the Retail Industry.

@Thu_Duo
Thu Pham
Information Security Journalist

Thu Pham covers current events in the tech industry with a focus on information security. Prior to joining Duo, Thu covered security and compliance for the infrastructure as a service (IaaS) industry at Online Tech. Based in Ann Arbor, Michigan, she earned her BS in Journalism from Central Michigan University.

Google’s new CAPTCHA security login raises ‘legitimate privacy concerns’

The “CAPTCHA” has infuriated web users for years: It’s that login test that asks you to type in a hard-to-read sequence of letters or numbers in order to prove are not a robot. Get one letter wrong and you’ll be denied access.

In a bid to ease that irritation, Google launched what it dubbed the “No Captcha ReCAPTCHA” in December, which it claims has the ability to verify a human user by looking at things like the behavior of their mouse movements and the way they type.

But device recognition company AdTruth believes it has found evidence Google’s CAPTCHA killer is collecting far more information than mouse coordinates alone, and that it could use the security tool to inform its advertising services too. The new tool isn’t overtly labeled as a Google service, yet anyone clicking through it “consents” to be tracked by Google’s cookies, AdTruth found. And while the service is intended to do only one thing — determine whether you are a human or not — it is also able to identify a lot more information about which specific human you are.

All of this is poorly disclosed to users, AdTruth believes.

Google declined to comment when reached by Business Insider.

The original CAPTCHA was designed to protect websites from spam and bots, but Google research found that artificial intelligence technology has now become so sophisticated it can solve even the most distorted of text at 99.8% accuracy.

That is why it created the “No CAPTCHA reCAPTCHA” which simply asks users to click a check box, or complete another task, such as selecting all the cats in a selection of images, to confirm they are not a robot. Google says its risk assessment software uses behavioral cues, such as where users click, how long they linger over a checkbox, and their typing cadence, to work out whether they are human or not. Google created this video to explain how the process works, and you can try out a demo of No CAPTCHA reCAPTCHA for yourself here .

No CAPTCHA reCAPTCHA seems so easy and reliable that companies such as Snapchat, BuzzFeed, WordPress, and Humble Bundle immediately signed up to adopt it.

But according to research from AdTruth, seen by Business Insider, Google’s No CAPTCHA reCAPTCHA appears to be collecting personally identifiable additional data beyond mere behavioral cues about their users, too.

Here’s all the data No CAPTCHA reCAPTCHA collects
mobile nocaptcha recaptcha
Google

AdTruth’s lead engineer Marcos Perona was skeptical of Google’s claim to look for “human behavior” to distinguish a real person from a bot and decided to investigate. He wanted to find out what Google actually “captures” from a machine with the No CAPTCHA to work out whether a user is a bot or not.

After taking a close look at the embedded code for the No CAPTCHA product, he found that the system used a re-purposed version of Google’s Botguard technology, which was originally intended for anti-spam and bot detection within Gmail. On top of that, No Captcha uses a level of encryption that hides what the mechanism is doing, by constantly changing the No CAPTCHA code and encryption keys, making it difficult for bot makers to crack (and it also has the by-product of making it difficult for researchers like Perona to uncover exactly how the No CAPTCHA works.)

But Perona and other anonymous programmers from information security backgrounds, believe they have decoded the new CAPTCHA system and the information it pulls from a browser when a user says they are not a robot. (AdTruth points out to Business Insider that it is not releasing any information that could help botmakers circumvent the No CAPTCHA reCAPTCHA.)

According to Perona, Botguard first takes a look at whether you already have a Google cookie on the machine. The No CAPTCHA reCAPTCHA then drops its own cookie from Google into your browser. It then takes a pixel-by-pixel fingerprint of the user’s browser window at that time, pulling information such as:

Screen size and resolution, date, language and browser plug-ins (all Javascript objects)
IP address
CSS information from the page you are on
A count of mouse and touch events
In addition, Google’s new CAPTCHA will also make use of any cookies that have been set by other Google properties — like Gmail, Search, Analytics, and so on — in the last six months. The belief is that humans use Google’s services in certain “human” ways, whereas bots do not, and those patterns can be detected.

All of this personally identifiable information gets encrypted and sent back to Google.

The reCAPTCHA gives Google “a very high level of entropy when it comes to distinguishing an individual”
wordpresscaptchaWordpressHere’s how the NoCAPTCHA reCAPTCHA appears on WordPress.

Perona told us: “The use of Google.com’s domain for the CAPTCHA is completely intentional, as that means Google can drop long-lived cookies in any device that comes into contact with the CAPTCHA, bypassing third-party cookie restrictions [like ad blockers] as long as the device has previously used any service hosted on Google.com.”

He added: “The mix of a fingerprint and first-party cookies is pervasive as Google can give a very high level of entropy when it comes to distinguishing an individual person.”

The way the new CAPTCHA works also seems to support this theory, as there appears to be at least three main CAPTCHA types, according to AdTruth’s research:

If Google cookies are present, and your fingerprint is obtained, you will often see the checkbox that asks you to prove whether you are a human.
If you delete all your Google cookies, the CAPTCHA will likely ask you to fill in a two-word CAPTCHA.
If you are using a form of anti-fingerprinting plugin, Google will likely ask you to fill in a two-word CAPTCHA, regardless of your cookies.
The implication is that Google isn’t just looking to identify whether you’re a human with its No CAPTCHA, but potentially exactly which human you are. The combination of first-party cookies and a browser fingerprint can be tied back to an individual — and most individuals simply clicking “I’m not a robot” won’t know this is happening behind the scenes.

AdTruth EMEA managing director James Collier told us: “This is a way for Google to indirectly link activity outside of Google’s properties – collected under the guise of security – to Google’s knowledge of that individual, without providing the consumer an opt out for the security fingerprint. When they went to market with reCAPTCHA they spoke about humanity and transparency. But in reality, their intentions appear hidden, as was the case with the collection of location data for traffic maps. It’s a question of trust: Google have developed a digital ecosystem that relies on them without question, and as the stakes get higher, consumers and industry alike should wake up to the risk of relying on companies that don’t transparently handle clear conflicts of interests in relation to their data.”

The No CAPTCHA reCAPTCHA privacy policy: “We also use the information to offer you tailored content”
Captcha websiteGoogle

Another red flag AdTruth noticed was the privacy policy that appears underneath the No CAPTCHA reCAPTCHA. It’s the same global privacy policy Google uses universally across all its services.

And it’s also the same policy that refers to unique device identifiers and states: “We also use the information to offer you tailored content — like giving you more relevant search and ads.”

It potentially means Google could be using the data collected from what is meant to be security software (which, remember, is also placed on sites other than its own), to improve services beyond anti-spam security, like advertising.

Google combined 60 of its privacy policies into one in 2012. Indeed, in January this year, the UK’s Information Commissioner’s Office ordered Google to sign a formal undertaking to improve the information it provides to people about how it collects personal data in the UK. The ICO’s three-year investigation found Google was “too vague” when describing how it uses personal data gathered across its web services and products.

Business Insider contacted the ICO with AdTruth’s findings on Google’s No CAPTCHA product. The ICO provided us with this statement: “The Data Protection Act requires organizations to be clear and open about the way they are using people’s information. We are currently looking into the information you have provided to establish full details.”

Of course, the privacy concerns raised by No CAPTCHA are not limited to the UK or Europe; its products and services are used by hundreds of millions of users across the world.

No CAPTCHA reCAPTCHA raises “some legitimate privacy concerns”
attached imageEFFJeremy Gillula, staff technologist at the Electronic Frontier Foundation

You’ll have noticed lots of “cans” and “coulds” in this story. It’s extremely hard to verify how often Google is collecting fingerprinting data and how or if the company is using it. But two prominent privacy researchers told Business Insider they found AdTruth’s preliminary conclusions “concerning.”

Jeremy Gillula, staff technologist at the Electronic Frontier Foundation, told us: “It’s definitely concerning that Google is conflating the privacy policies of their security systems like reCAPTCHA with their other products. Many website relied on reCAPTCHA to prevent spam, and just because I want to post on one of those websites doesn’t mean I want it connected to Google’s profile on me.”

He adds that if Google were to commit to not using data collected via No CAPTCHA reCAPTCHA for any purpose other than further developing No CAPTCHA reCAPTCHA, this aspect wouldn’t be so bad.

But there would still be issues: “My bigger concern is that by over-identifying whether or not someone is a human by figuring out precisely which human they are, Google is contributing to the trend of making the web harder to use for people who value their privacy. In essence, Google is assuming you’re only human if you’re part of their system. If you choose not to use Google services, or if you choose to preserve your privacy, then you’re essentially classified as a second class citizen.”

Steven Murdoch, principal research fellow in the information security research group at University College London’s department of computer sciences, agreed that AdTruth’s research into the No CAPTCHA does raise “some legitimate privacy concerns.”

But he emphasized that it’s unlikely to be a conspiracy. Murdoch told us: “In terms of the way that the No CAPTCHA detector works, I think the reason it collects so much information is likely because the detection algorithm is machine-learning based rather than written by hand. Such systems are generally designed by collecting all information which might be of use then letting the machine learning system come up with an optimal decision.”

Google did not provide Business Insider with a statement but did point us towards the Google DoubleClick policy which explicitly prohibits the use of browser fingerprinting for ad targeting. However, this is a policy for Google’s partners, not Google itself, although it probably suggests Google abides by these rules too.

There is no evidence Google is doing, or is planning to do, anything nefarious with the information the new No CAPTCHA reCAPTCHA scans and collects — and it’s unlikely Google ever would use the data scraped through the software for advertising purposes.

The software looks at engagement “before, during, and after” an interaction
james collier ad truthAdTruthJames Collier, AdTruth EMEA managing director.

However, as AdTruth’s Collier pointed out, the key issue is a question of trust: Google’s own marketing around the launch of the No CAPTCHA reCAPTCHA is scant on details about the user data the software assesses, although the company did acknowledge in a blog post in 2013 that the software looks at engagement “before, during, and after” a CAPTCHA interaction.

Business Insider could only find one article, from Wired, in which it was explained that Google also examines cookies and IP addresses alongside mouse movements and typing behaviour (but nothing to do with a fingerprint) to determine whether that user “is the same friendly human Google remembers from elsewhere on the web.”

Essentially, even if you are really interested in discovering more about the mechanics behind the No CAPTCHA reCAPTCHA, it’s extremely difficult to find an explanation on the web.

The No CAPTCHA reCAPTCHA is an intelligent tool which will no doubt help cut through the deluge of spam and bots attacking sites across the web. But it may be in Google’s interest to set out exactly — and more prominently — how that tool is so clever at telling the difference between bots and humans.

Read more: http://www.businessinsider.com/google-no-captcha-adtruth-privacy-research-2015-2#ixzz3SItqU1n5

Data Breach Statistics from IBM

1.5 million. Monitored cyber attacks in the United States in 2013. IBM Security Services 2014 Cyber Security Intelligence Index, April 2014
Quantifying the data breach epidemic
Data breaches are among the most common and costly security failures in organizations of any size. In fact, studies show that companies are attacked an average of 16,856 times a year, and that many of those attacks result in a quantifiable data breach. And with today’s data moving freely between corporate networks, mobile devices, and the cloud, data breach statistics show this disturbing trend is rapidly accelerating.
Building a business case for security investment begins with quantifying the threat. Data breach statistics abound on the web. But not all data breach statistics are reported, and some victims aren’t even aware they’ve been compromised. IBM conducts its own research – in part collected from the thousands of clients whose networks we monitor – and we share our data breach statistics in the reports you’ll find on this page. And we can assure you — the data breach threat is very real, and very costly. To learn more about today’s critical threats and how companies are responding, download one of our comprehensive data breach statistics reports.

Estimate of 88% of digital ads are clicked by bots

New study suggests that over 88% of digital ads are clicked by automated programs, not humans, costing advertisers billions of dollars. Bots evolving to circumvent current detection technology.

LUXEMBOURG, Feb. 3, 2015 /PRNewswire/ — Oxford BioChronometrics, the Luxembourg-based company specializing in Human Recognition Technology, has conducted a study on fraud in digital advertising engagement that suggests between 88% and 98% of digital ad engagement is fraudulent. The results differ greatly from recently published studies claiming much lower percentages.

The study, conducted by Oxford BioChronometrics Chairman & Chief Technology Officer Adrian Neal and his team, involved placing digital ads on the Google, Yahoo, Facebook and LinkedIn advertising platforms. The ads were then monitored for engagement by automated programs known as bots as well as interaction by human beings.

The best performance the study saw was on the LinkedIn ad network, which had 88% fraudulent activity by bots. Google’s ad network was the worst with 98% bot fraud, while Yahoo and Facebook were tied for second at 94%. The study goes on to note that the team’s ads were billed for these fraudulent clicks and impressions.

“We didn’t make this press release lightly and we’re keenly aware of the ire it will draw,” explained Neal. “However, we believe that the integrity of the online community must be served and felt we had no choice but to go public with our findings. Advertisers make many online communities possible and it’s critically important to ensure the integrity of their online experience if we expect them to remain engaged. Only transparency can accomplish this and the time has come to shine a light on an area few are motivated to look at.”

“BioChronometrics represents an entirely new way of analyzing user/device interaction and, until now, the technology simply wasn’t available. Given the increased sophistication of automated programs and the subsequent increase in online click fraud, it’s not surprising that previous means of detection were unable to catch this level of fraud,” added Sander Kouwenhoven, the company’s Managing Director of Information and Technology.

Adding to the challenges faced by advertisers and internet advertising agencies, Neal and his team also emphasized the fact that bots are getting better. “We classified 6 different categories of bots, from Basic to Highly Advanced, including one that we can only classify as Humanoid.”

Humanoid bots can only be detected through deep behavioral analysis with particular emphasis on attempts to subsequently introduce measures of random behavior that mimic natural human behavior. “What this means,” Neal explained, “is that people who deploy the bots that commit ad fraud are getting ahead of the standard methods of detecting them. The common tools and methods of detection are no longer enough.”

This study is in-part driven by Google’s recommendation in their Nov. 2014 report “The Importance of Being Seen” that advertisers should aim for above 50% viewability rather than simple impressions rendered as a metric. “The threshold of 50% strikes us as too low,” stated David Scheckel, the company’s CEO. “Especially because we know our Digital Ad Protection Technology can achieve near 100% viewability and guarantee actual human clicks.”

Imperva 5 Cyber Security Predictions for 2015

Imperva Data Security Blog

January 14, 2015
5 Cyber Security Predictions for 2015
Crystal ball photo for blogImperva has been in the business of protecting the high-value applications and data assets at the heart of the enterprise since 2002. In the years since, we’ve gained tremendous knowledge about cyber security and the origins and nature of cyber attacks. This knowledge has come from analyzing the data collected by our SecureSphere products in installations around the world, as well as from working closely with over 3,500 customers from across many industries.

When security vendors are challenged at the end of each calendar year to come up with predictions for the year ahead, we like to combine the data we’ve collected from our products with the insights that we’ve gathered from our customers, to come up with some meaningful commentary and helpful guidance. What follows are our predictions for the year ahead, with more to come throughout the year as we continue to analyze what our products have to tell us about how the security landscape is evolving.

2015

1. The year of revolt

2015 could be the year when merchants in the US revolt against the credit card companies’ policy of sticking them with both the liability for fraud and the responsibility for protecting what is essentially un-protectable: credit card numbers that have to be shared in order to be used, and which can be abused simply by knowing what they are. Fallout from such a change could vary widely, but it’s possible that we will see the rise of separate infrastructure for secure payments (like ApplePay) or a more secure credit card infrastructure (chip and pin) in the United States.

2015 could also be the year when consumers revolt at the prospect of having to change their credit card numbers so often. This has been the typical response to mega-breaches with lots of issuers cycling cards. While this is ultimately in the consumer’s best interest, it’s a pain for people to re-sign up for automatic payments, update records with their various business associates and begin anew. Besides resulting in the rise of separate infrastructure for secure payments (above), could we see a credit card outcompete their peers based on cardholder security?

2. The rise of Cyber Insurance

Due to the breaches in 2013 and 2014 that wreaked havoc on the businesses, brands, reputations and leadership of way too many enterprises, 2015 will be the year that Cyber Insurance gains velocity and popularity. The Board and the C-Suite will have an appetite for reducing risk by offloading it to insurance providers. Government agencies and insurance companies are already at work establishing guidelines to support the growth of the cyber insurance market. Reduced Cyber Insurance premiums could be a new business benefit touted by security vendors, as premiums are reduced when a company demonstrates proof of having critical security controls in place.

See how Imperva can help you jump start your efforts to reduce risk.

3. The “cloudification” of IT will accelerate

In 2015, the “cloudification” of IT will accelerate, and we will see some big organizations using the cloud, including more and more financial institutions offering services via SaaS platforms. New compliance mandates for the cloud (ISO 27016, SSEA 16 etc.) are contributing to this phenomenon, because they enable businesses to validate their security posture and risk levels.

This leads us directly to a longer term prediction: By 2017, the term “on-premises data-center” will be a term of the past for the small- and mid-size business market, which will move entirely to the SaaS model.

Access this reference architecture for protecting your AWS-based web applications. It capitalizes on Skyfence which can be used to protect all your SaaS applications.

4. The first Big Data-related breach

As practical applications for Big Data grow, and the amount of information managed by businesses of every size reaches astronomical proportions, the temptation for hackers to secure the prize of being the first to hack a Big Data installation will mount as well. In 2015, the first big Big Data-related data breach will occur. The lack of administration and security knowledge in such installations, combined with the advancements in server side attacks by hackers will result in hackers trying to and successfully infiltrating this growing platform.

Learn how to address the top threats facing database and Big Data resources.

5. DDoS Attackers Take a Page from the APT Playbook

In 2014, DDoS attacks became much more sophisticated. Though much of the reporting focused on the size of attacks, a more troubling trend was the advancement in attack techniques. Much like their APT brethren, DDoS attacks can now morph and adapt based on the defenses in place. Hackers also dupe sites using impersonation, looking for vulnerabilities and cataloging them for future exploit. Though not as stealthy as APTs, DDoS attackers are learning from the successes of APT hackers and adopting some of their techniques for an equally troubling network based attack trend. And DDoS attacks are becoming increasingly common; a majority of organizations can expect to be hit with DDoS attacks in 2015. (Sources: Incapsula DDoS Trends Report 2014, DDoS Impact Survey 2014

Facts about ReCAPTCHA and NoCAPTCHA

Sorry Google, No CAPTCHA reCAPTCHA doesn’t stop bots
ShieldSquare
Google recently launched a new version of reCaptcha which claims to be more robust to bots and easy going on the humans.

While this video on youtube by Google is pretty convincing too, things got a little interesting when we dug deeper. The new approach which seems to be a sophisticated bot identification algorithm, is nothing but a mere usage of browser cookies.

So here’s what happens when you are thrown a reCAPTCHA challenge:

* You are asked to solve a reCAPTCHA image the first time.
* The response to the evaluation of the text string entered by you, is cached in your browser’s cookies.
* The next time you visit the page, or any page which requires you to pass reCAPTCHA, the information from these cookies is used to identify whether you have passed the test before.

A simple test can be done here: https://wordpress.org/support/register.php.

After solving the reCAPTCHA image for the first time, it does not require you to solve an image when you visit again. But, once you delete your cookies, and try again … there! Back to square one, you are required to solve the image to succeed the form submission. Google has simply used cookies to retain information about your authenticity.

What does this mean for bots? Now bots can use an OCR tool to solve the information or require somebody to solve the image initially, post which, the bot can retain the cookies and continue scraping!

P.S: Well, We haven’t got to the main course yet!

The new version of reCAPTCHA can also be bypassed by another technique. This can be done by using the website’s public key (called data-sitekey). Wait, what? Yes! Let’s say a bot wanted to bypass a website X’s reCAPTCHA without actually letting a user (on website Y) know that he is allowing a bot to do so. More technically, this is called clickjacking or UI redress attack. The bot could use the data-sitekey of website X and disable the Referer header on a web page in Y where the user would be asked solve reCAPTCHA.

Once the user solves the CAPTCHA, the response (called “g-recaptcha-response”) can be used by a bot running in the background to submit a form on website X. This way, the bot could trick Google into thinking that the solved reCAPTCHA response was originating from website X (while it is actually coming from Y). Hence, the bot is able to proceed scraping on webiste X. This magically works because Google doesn’t validate the referer header if it has been disabled by the client or is empty. A genuine user just contributed to a bot scraping website X without actually realizing that he was being used as an access card.

This post has been inspired from the original blog article by @homakov. A sample of this has already been implemented and hosted on github.

Bots Outnumber Humans on the Web

Bots Now Outnumber Humans on the Web
BY ROBERT MCMILLAN 12.18.14 | 9:00 AM |

Diogo Mónica once wrote a short computer script that gave him a secret weapon in the war for San Francisco dinner reservations.
This was early 2013. The script would periodically scan the popular online reservation service, OpenTable, and drop him an email anytime something interesting opened up—a choice Friday night spot at the House of Prime Rib, for example. But soon, Mónica noticed that he wasn’t getting the tables that had once been available.

By the time he’d check the reservation site, his previously open reservation would be booked. And this was happening crazy fast. Like in a matter of seconds. “It’s impossible for a human to do the three forms that are required to do this in under three seconds,” he told WIRED last year.

Mónica could draw only one conclusion: He’d been drawn into a bot war.

Everyone knows the story of how the world wide web made the internet accessible for everyone, but a lesser known story of the internet’s evolution is how automated code—aka bots—came to quietly take it over. Today, bots account for 56 percent of all of website visits, says Marc Gaffan, CEO of Incapsula, a company that sells online security services. Incapsula recently an an analysis of 20,000 websites to get a snapshot of part of the web, and on smaller websites, it found that bot traffic can run as high as 80 percent.

People use scripts to buy gear on eBay and, like Mónica, to snag the best reservations. Last month, the band, Foo Fighters sold tickets for their upcoming tour at box offices only, an attempt to strike back against the bots used by online scalpers. “You should expect to see it on ticket sites, travel sites, dating sites,” Gaffan says. What’s more, a company like Google uses bots to index the entire web, and companies such as IFTTT and Slack give us ways use the web to use bots for good, personalizing our internet and managing the daily informational deluge.

But, increasingly, a slice of these online bots are malicious—used to knock websites offline, flood comment sections with spam, or scrape sites and reuse their content without authorization. Gaffan says that about 20 percent of the Web’s traffic comes from these bots. That’s up 10 percent from last year.

Often, they’re running on hacked computers. And lately they’ve become more sophisticated. They are better at impersonating Google, or at running in real browsers on hacked computers. And they’ve made big leaps in breaking human-detecting captcha puzzles, Gaffan says.

“Essentially there’s been this evolution of bots, where we’ve seen it become easier and more prevalent over the past couple of years,” says Rami Essaid, CEO of Distil Networks, a company that sells bot-blocking software.

But despite the rise of these bad bots, there is some good news for the human race. The total percentage of bot-related web traffic is actually down this year from what it was in 2013. Back then it accounted for 60 percent of the traffic, 4 percent more than today.

Ellipsis announces Human Presence technology

Bot or Not?
MARCH 21, 2014by Jennifer Oladipo

Ellipsis targets “human presence” on the Web
Rather than make website visitors prove they’re human, Ellipsis wants companies to use Human Presence technology to figure that out automatically.

Bill West, Ellipsis chairman and CEO, said the company studied more than 80 million actions of Web users to create software that can tell the difference between human and botnet traffic within milliseconds. This vastly reduces the need for Turing tests – those codes, math problems, images and other devices people often must solve to prove that they are human, he said.

Ellipsis – named after the “…” used to signify missing text – pitches its software as a chance to “win the arms race” in Web security, arguing that machine-learning algorithms can stay one step ahead of bot evolution.

West is also a managing partner with The Atlantic Partners, which acquires and sells underperforming companies on behalf of private equity firms. He sat down with UBJ to explain the company’s product and plan.

How was Ellipsis founded?

It was an Atlanta company called Pramana, and actually funded by UCAN [Upstate Carolina Angel Network]. It was abandoned, basically nothing more than software. We moved about a year ago. We liked what they did but didn’t like the way they did it.

Atlantic Partners is one of the owners, along with the other three founders on the management team, UCAN and other investors. We bought the intellectual property, and put all Greenville talent on it to get it working. We rewrote virtually all of their code.

0321UBJJumpStartEllipsis5Greg

How did you gather a team to revamp the product?

I worked in the various high-tech groups in town, so I knew who was capable of handling this kind of deal. We had the choice of hiring a staff, but everybody we have has their own company. I thought we could put a part-time a team together that’s really some of the most talented people in town. They were also involved in the initial analysis when looking at the software.

Peter Waldschmidt, vice chairman, is brilliant working on design, data collection and algorithmic models. Andy Kurtz is CEO of ProActive technology, a premier programing shop in the area. His crew, led by Kelly Summerlin and Rob Hall, built the data collection processes. We got financial guidance from Matt Dunbar at UCAN. The Atlantic Partners provided overall strategy and oversight.

What need does Human Presence meet?

Unwanted botnet traffic is a problem. Attackers come in with bots and scrape information from websites. There’s also click fraud [bots clicking on ads to generate revenue]. Bots are on track to waste nearly $10 billion of advertising dollars spent in 2013.

TuringTestBut 3 percent of Web users log off immediately when they run into Turing tests. More than 30 percent fail on the first attempt to solve the puzzle. We were trying to do something that was totally nonintrusive. Instead of annoying 100 percent of customers, you’ll only annoy maybe five percent.

Then we also wanted painless, simple installation for site owners. It’s a single line of JavaScript that can be installed in less than a minute.

botnet-provided

How does it work?

We’ve studied the time it takes people to press a key, move the mouse around the form, other data points. We can detect in milliseconds whether or not we have a bot. We give businesses a free report to know if they have a bot problem or not. If there is a problem, more detailed reporting is available for a fee.

(Full disclosure: The Greenville Journal was one of the beta sites.)

Who’s your target market?

Real estate, periodicals and blog sites that have lots of content are vulnerable. So are online ticketing companies that deal with bots that are scalpers. Those are easy. When you get into banking it’s a little more complex, and we can do that, too.

What’s the next step?

There will be some staffing up, then we’re going to market mid-March. We’ve gotten inquires to buy from west and east coast companies. An exiting plan was there from the beginning. We’re seeking partners for distribution, investment or acquisition.

Where did the name Ellipsis come from?

I guess when we were all sending emails back and forth to each other in the early stages, I noticed that everyone was using the ellipses, like there was more to think about. I thought, that’s definitely something a human would do, and not what a bot would do.

TAGS: online security, tech and design, technology

Current Methods for Website Security

If you run a website that allows visitors to comment, or where your clients have to set up user accounts, you need some kind of security in place to prevent abuse. Hackers can create robots that can enact malicious attacks on your site by posing as humans. Some of those attacks include making comments and registration requests. Because robots, or “bots,” can work much faster than humans, they could easily bog down your website with multiple attacks in a short time. For this reason, you need some kind of security that can distinguish between humans and machines, and protect your site from malicious attacks.

Types of Security

There are several ways to secure your site form robot attacks, from complex to simple:

CAPTCHAs

CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart. It is a complex, but effective, way to differentiate humans from machines.

CAPTCHA uses graphic representations of letters, words, and symbols, which the users are required to type in. The idea is that a robot will not be able to recognize the letters and symbols in order to replicate them.

Recommended for You
Webcast: The Key Social Media Trends for 2015
Unfortunately, the problem with traditional CAPTCHA is that many humans can’t recognize them either. Some CAPTCHAs have an audio option for visually impaired users, but the audio can often be just as difficult to decipher.

In response to the issues with traditional CAPTCHAs, companies like Confident Technologies have spearheaded image-based CAPTCHAs, which use photo images instead of text graphics. The images are easier for most humans to recognize, but still difficult for a bot to manipulate.

The image-based CAPTCHAs could be presented as a single image or as a mini game where users have to solve a puzzle with the images.

Text, Email or Phone Verification

With text or email verification your site will end a text or an email, or place a phone call to anyone who tries to create an account, post a comment, or perform other actions on your site. The user then has to respond to the message, either by clicking on a verification link, pressing a button, or by returning to your site to enter a code.

The advantage to this type of verification is that it requires your users to enter specific information to proceed, which a robot might not be able to do. The disadvantage is that it requires your users to have their phones or email handy. In most cases this should not be an issue, especially considering the advances in smartphone technology, but there could be occasions where a phone or email might not be available.

The Honey Pot

A honey pot is a trap designed to lure the victim into doing something they shouldn’t. In terms of security for your site it means luring a robot to fill a field that it’s not supposed to, or to fill it incorrectly. The field usually contains instructions like “leave this field blank,” a robot won’t be able to read the field and will disqualify itself by entering data.

The issue with the honey pot is that sometimes users can also neglect to read the instructions and disqualify themselves.

Submission Timing

Submission timing is simply the amount of time it takes to complete a task. Since robots generally complete tasks faster than humans, if there is too short of a time frame between tasks, especially similar tasks, the system reads it as a robot and displays a warning message. If the actions continue then the system blocks the user.

The advantage to submission timing is that it’s fairly simple and straightforward. The Ellipsis Human Presence technology utilizes these timing and measurement datapoints and along with their proprietary algorithms, a database of known human behavior and machine learning, they are able to accurately detect human site visitors and quarantine all traffic that does not meet this criteria.

Check Boxes

Check boxes are one of the simplest ways to secure against robots. It is essentially a checkbox on the form that’s invisible to a machine, yet visible to the user. The user must check the box to proceed.

The advantage to the check box is that it is very simple and easy to implement. The disadvantage is that robots have progressed to the point where some can recognize the boxes.

Read more at http://www.business2community.com/tech-gadgets/types-website-security-01092313#MmEft7QREqJoMlip.99

Marketers Could Lose $6.3 Billion To Bots In 2015

Marketers have a billion-dollar bot problem.

Global advertisers could lose $6.3 billion to bots in 2015, if current fraud rates continue. That’s according to new research from the ANA (Association of National Advertisers) and a study conducted with White Ops, an ad fraud detection firm.

Nearly one-fourth (23%) of all video ads are served to bots, while 11% of display ads are bot-infected. Per the ANA study, bots accounted for 17% of all programmatic ad traffic and 19% of retargeted ads.

The data comes from 181 campaigns from 36 ANA member companies. The campaigns were measured for 60 days and accounted for 5.5 billion impressions across three million domains. The campaigns came from major brands including Ford, Walmart, Verizon, Prudential, MasterCard, Kellogg’s, Johnson & Johnson and more.

While the average bot activity for programmatically-purchased ads was 17%, it was significantly higher in some instances.

For example, one agency for a CPG brand ran a video campaign through a publicly traded video supply-side platform (SSP) and 62% of the ads were served to bots, per the study.

Additionally, “for 18 of the 36 study participants, three well-known programmatic ad exchanges supplied programmatic traffic with over 90% bots,” the report reads. “A bot site used the opacity of programmatic display traffic sourcing through demand-side platforms (DSPs) to systematically defraud advertisers.”
To clarify, that does not mean the three exchanges were supplying 90% fraudulent inventory. Rather, there was one site that all three exchanges were sourcing from, and 90% of the inventory from that site was fraudulent. In the end, 18 of 36 advertisers ended up purchasing inventory from that site via the three major ad exchanges.

It’s not just the open ad exchanges that have a bot problem; the report also notes that premium publishers can be infected as well. One CPG brand purchased 230,000 impressions from a premium U.S. media company, per the study, but 19% of the inventory purchased was fraudulent.

The report found that the majority of bots came from residential IPs, noting that 67% of bot traffic “comes from everyday computers that have been hacked.” In an unrelated study released earlier this year, Forensiq, another digital ad detection firm, showed an example of what bots can do behind the scenes once a computer is infected.

“By using the computer of real people … the bots do not just blend in, they get targeted,” the White Ops and ANA report reads. “Sophisticated bots moved the mouse, making sure to move the cursor over ads. Bots put items in shopping carts and visited many sites to generate histories and cookies to appear more demographically appealing to advertisers and publishers.”

Buying tickets with less aggravation

A Rolling Stones concert in London, 1969.

Buying concert tickets these days requires a lot of planning and speed.

You have to be on the dot when tickets go on sale, and even if you are, page loading, internet speed and the always-annoying captcha code can slow you down. Marketplace Tech’s Ben Johnson may have a trick up his sleeve to get tickets faster and with less hassle.

Ticketmaster now has an app for buying tickets. Ben says the things that usually make people nervous about ticket buying— like giving out personal information — can be strengths when purchasing through an app.

“It knows your location, it knows your identity and that means you might get tickets faster and score better seats if you use the app,” Johnson says.

Your smartphone can also help you find closer seats even when you’ve already purchased a ticket. iBeacon uses its technology to pinpoint your location using your mobile phone and ticketing apps use this feature to help you move up. Unfortnately if this option doesn’t suit you, there isn’t really an alternative, save for standing in line.

New research on bot traffic

 

The Incapsula Blog

Back to Blog

09

December

Report: Bot traffic is up to 61.5% of all website traffic

Igal Zeifman

Last March we published a study that showed the majority of website traffic (51%) was generated by non-human entities, 60% of which were clearly malicious. As we soon learned, these facts came as a surprise to many Internet users, for whom they served as a rare glimpse of “in between the lines” of Google Analytics.

Since then we were approached with numerous requests for an updated report. We were excited about the idea, but had to wait; first, to allow a significant interval between the data, and then for the implementation of new Client Classification features.

With all the pieces in place, we went on to collect the data for the 2013 report, which we’re presenting here today.

Bot traffic is up to 61.5% of all website traffic

Research Methodology

For the purpose of this report we observed 1.45 Billion visits, which occurred over a 90 day period. The data was collected from a group of 20,000 sites on Incapsula’s network, which consists of clients from all available plans (Free to Enterprise). Geographically, the traffic covers all of the world’s 249 countries, per country codes provided by an ISO 3166-1 standard.

Report Highlights

Bot Traffic is up by 21%

Compared to the previous report from 2012, we see a 21% growth in total bot traffic, which now represents 61.5% of website visitors. The bulk of that growth is attributed to increased visits by good bots (i.e., certified agents of legitimate software, such as search engines) whose presence increased from 20% to 31% in 2013. Looking at user-agent data we can provide two plausible explanations of this growth:

  • Evolution of Web Based Services: Emergence of new online services introduces new bot types into the pool. For instance, we see newly established SEO oriented services that crawl a site at a rate of 30-50 daily visits or more.
  • Increased activity of existing bots: Visitation patterns of some good bots (e.g., search engine type crawlers) consist of re-occurring cycles. In some cases we see that these cycles are getting shorter and shorter to allow higher sampling rates, which also results in additional bot traffic.

31% of Bots Are Still Malicious, but with Much Fewer Spammers

While the relative percentage of malicious bots remains unchanged, there is a noticeable reduction in Spam Bot activity, which decreased from 2% in 2012 to 0.5% in 2013. The most plausible explanation for this steep decrease is Google’s anti-spam campaign, which includes the recent Penguin 2.0 and 2.1 updates.

SEO link building was always a major motivation for automated link spamming. With its latest Penguin updates Google managed to increase the perceivable risk for comment spamming SEO techniques, while also driving down their actual effectiveness.

Based on our figures, it looks like Google was able to discourage link spamming practices, causing a 75% decrease in automated link spamming activity.

Evidence of More Sophisticated Hacker Activity

Another point of interest is the 8% increase in the activity of “Other Impersonators” – a group which consists of unclassified bots with hostile intentions.

The common denominator for this group is that all of its members are trying to assume someone else’s identity. For example, some of these bots use browser user-agents while others try to pass themselves as search engine bots or agents of other legitimate services. The goal is always the same – to infiltrate their way through the website’s security measures.

The generalized definition of such non-human agents also reflects on these bots’ origins. Where other malicious bots are agents of known malware with a dedicated developer, GUI, “brand” name and patch history, these “Impersonators” are custom-made bots, usually crafted for a very specific malicious activity.

One common scenario: un-categorized DDoS bot with a spoofed IE6 user-agentOne common scenario; un-categorized DDoS bot with a spoofed IE6 user-agent.

In terms of their functionality and capabilities, such “Impersonators” usually represent a higher-tier in the bot hierarchy. These can be automated spy bots, human-like DDoS agents or a Trojan-activated barebones browser. One way or another, these are also the tools of top-tier hackers who are proficient enough to create their own malware.

The 8% increase in the number of such bots highlights the increased activity of such hackers, as well as the rise in targeted cyber-attacks.

This is also reflective of the latest trends in DDoS attacks, which are evolving from volumetric Layer 3-4 attacks to much more sophisticated and dangerous Layer 7 multi-vector threats.