The 773 Million Record “Collection #1” Data Breach

Collection #1 is a set of email addresses and passwords totalling 2,692,818,238 rows. It’s made up of many different individual data breaches from literally thousands of different sources. (And yes, fellow techies, that’s a sizeable amount more than a 32-bit integer can hold.)

In total, there are 1,160,253,228 unique combinations of email addresses and passwords. This is when treating the password as case sensitive but the email address as not case sensitive. This also includes some junk because hackers being hackers, they don’t always neatly format their data dumps into an easily consumable fashion. (I found a combination of different delimiter types including colons, semicolons, spaces and indeed a combination of different file types such as delimited text files, files containing SQL statements and other compressed archives.)

The unique email addresses totalled 772,904,991. This is the headline you’re seeing as this is the volume of data that has now been loaded into Have I Been Pwned (HIBP). It’s after as much clean-up as I could reasonably do and per the previous paragraph, the source data was presented in a variety of different formats and levels of “cleanliness”. This number makes it the single largest breach ever to be loaded into HIBP.

There are 21,222,975 unique passwords. As with the email addresses, this was after implementing a bunch of rules to do as much clean-up as I could including stripping out passwords that were still in hashed form, ignoring strings that contained control characters and those that were obviously fragments of SQL statements. Regardless of best efforts, the end result is not perfect nor does it need to be. It’ll be 99.x% perfect though and that x% has very little bearing on the practical use of this data. And yes, they’re all now in Pwned Passwords, more on that soon.

That’s the numbers, let’s move onto where the data has actually come from.

Data Origins

Last week, multiple people reached out and directed me to a large collection of files on the popular cloud service, MEGA (the data has since been removed from the service). The collection totalled over 12,000 separate files and more than 87GB of data. One of my contacts pointed me to a popular hacking forum where the data was being socialised, complete with the following image:

As you can see at the top left of the image, the root folder is called “Collection #1” hence the name I’ve given this breach. The expanded folders and file listing give you a bit of a sense of the nature of the data (I’ll come back to the word “combo” later), and as you can see, it’s (allegedly) from many different sources. The post on the forum referenced “a collection of 2000+ dehashed databases and Combos stored by topic” and provided a directory listing of 2,890 of the files which I’ve reproduced here. This gives you a sense of the origins of the data but again, I need to stress “allegedly”. I’ve written before about what’s involved in verifying data breaches and it’s often a non-trivial exercise. Whilst there are many legitimate breaches that I recognise in that list, that’s the extent of my verification efforts and it’s entirely possible that some of them refer to services that haven’t actually been involved in a data breach at all.

However, what I can say is that my own personal data is in there and it’s accurate; right email address and a password I used many years ago. Like many of you reading this, I’ve been in multiple data breaches before which have resulted in my email addresses and yes, my passwords, circulating in public. Fortunately, only passwords that are no longer in use, but I still feel the same sense of dismay that many people reading this will when I see them pop up again. They’re also ones that were stored as cryptographic hashes in the source data breaches (at least the ones that I’ve personally seen and verified), but per the quoted sentence above, the data contains “dehashed” passwords which have been cracked and converted back to plain text. (There’s an entirely different technical discussion about what makes a good hashing algorithm and why the likes of salted SHA1 is as good as useless.) In short, if you’re in this breach, one or more passwords you’ve previously used are floating around for others to see.

So that’s where the data has come from, let me talk about how to assess your own personal exposure.

Modlishka allows for very easy fishing / MITM

You basically just put it on a local domain, point people there and it forwards the traffic up and down to the target website – so no templates, no warnings. It will also push through two factor authentication requests and answers.

Modlishka is a flexible and powerful reverse proxy, that will take your phishing campaigns to the next level (with minimal effort required from your side).

Enjoy 🙂

Features

Some of the most important ‘Modlishka’ features :

  • Support for majority of 2FA authentication schemes (by design).
  • No website templates (just point Modlishka to the target domain – in most cases, it will be handled automatically).
  • Full control of “cross” origin TLS traffic flow from your victims browsers.
  • Flexible and easily configurable phishing scenarios through configuration options.
  • Pattern based JavaScript payload injection.
  • Striping website from all encryption and security headers (back to 90’s MITM style).
  • User credential harvesting (with context based on URL parameter passed identifiers).
  • Can be extended with your ideas through plugins.
  • Stateless design. Can be scaled up easily for an arbitrary number of users – ex. through a DNS load balancer.
  • Web panel with a summary of collected credentials and user session impersonation (beta).
  • Written in Go.

https://github.com/drk1wi/Modlishka

In an email to ZDNet, Duszyński described Modlishka as a point-and-click and easy-to-automate system that requires minimal maintenance, unlike previous phishing toolkits used by other penetration testers.

“At the time when I started this project (which was in early 2018), my main goal was to write an easy to use tool, that would eliminate the need of preparing static webpage templates for every phishing campaign that I was carrying out,” the researcher told us.

“The approach of creating a universal and easy to automate reverse proxy, as a MITM actor, appeared to be the most natural direction. Despite some technical challenges, that emerged on this path, the overall result appeared to be really rewarding,” he added.

“The tool that I wrote is sort of a game changer, since it can be used as a ‘point and click’ proxy, that allows easy phishing campaign automation with full support of the 2FA (an exception to this is a U2F protocol based tokens – which is currently the only resilient second factor).

zdnet https://www.zdnet.com/article/new-tool-automates-phishing-attacks-that-bypass-2fa/

Pornhub 2018 in review

Follow along to see the most interesting data points amassed by our team of statisticians, all presented with colorful charts and insightful commentary. Enjoy!

The Year in Numbers
Top Searches & Pornstars
Traffic & Time on Site
Gender Demographics
Age Demographics
Devices & Technology
Celebrity Searches
Movie & Game Searches
Events, Holidays & Sports
Top 20 Countries in Depth

Source: https://www.pornhub.com/insights/2018-year-in-review

Security Breaches Don’t Affect Stock Price. Or don’t they?

Abstract: This report assesses the impact disclosure of data breaches has on the total returns and volatility of the affected companies’ stock, with a focus on the results relative to the performance of the firms’ peer industries, as represented through selected indices rather than the market as a whole. Financial performance is considered over a range of dates from 3 days post-breach through 6 months post-breach, in order to provide a longer-term perspective on the impact of the breach announcement.

Key findings:

While the difference in stock price between the sampled breached companies and their peers was negative (1.13%) in the first 3 days following announcement of a breach, by the 14th day the return difference had rebounded to + 0.05%, and on average remained positive through the period assessed.

For the differences in the breached companies’ betas and the beta of their peer sets, the differences in the means of 8 months pre-breach versus post-breach was not meaningful at 90, 180, and 360 day post-breach periods.

For the differences in the breached companies’ beta correlations against the peer indices pre- and post-breach, the difference in the means of the rolling 60 day correlation 8 months pre- breach versus post-breach was not meaningful at 90, 180, and 360 day post-breach periods.

In regression analysis, use of the number of accessed records, date, data sensitivity, and malicious versus accidental leak as variables failed to yield an R2 greater than 16.15% for response variables of 3, 14, 60, and 90 day return differential, excess beta differential, and rolling beta correlation differential, indicating that the financial impact on breached companies was highly idiosyncratic.

Based on returns, the most impacted industries at the 3 day post-breach date were U.S. Financial Services, Transportation, and Global Telecom. At the 90 day post-breach date, the three most impacted industries were U.S. Financial Services, U.S. Healthcare, and Global Telecom.

The market isn’t going to fix this. If we want better security, we need to regulate the market.

Source: Security Breaches Don’t Affect Stock Price – Schneier on Security

However, the dataset:

The analysis began with a dataset of 235 recorded data breaches dating back to 2005

is very very small and misses some of the huge breaches such as Equifax.
There is a very telling table in the results that does show that if a breach is hugely public, then share prices do indeed plummet:

So it may also have something to do with how the company handles the breach and how much media attention is out there.

Hackers Can Rickroll Thousands of Sonos and Bose Speakers Over the Internet

Perhaps you’ve been hearing strange sounds in your home—ghostly creaks and moans, random Rick Astley tunes, Alexa commands issued in someone else’s voice. If so, you haven’t necessarily lost your mind. Instead, if you own one of a few models of internet-connected speaker and you’ve been careless with your network settings, you might be one of thousands of people whose Sonos or Bose devices have been left wide open to audio hijacking by hackers around the world.Researchers at Trend Micro have found that some models of Sonos and Bose speakers—including the Sonos Play:1, the newer Sonos One, and Bose SoundTouch systems—can be pinpointed online with simple internet scans, accessed remotely, and then commandeered with straightforward tricks to play any audio file that a hacker chooses. Only a small fraction of the total number of Bose and Sonos speakers were found to be accessible in their scans. But the researchers warn that anyone with a compromised device on their home network, or who has opened up their network to provide direct access to a server they’re running to the external internet—say, to host a game server or share files—has potentially left their fancy speakers vulnerable to an epic aural prank.

Source: Hackers Can Rickroll Thousands of Sonos and Bose Speakers Over the Internet | WIRED

Vault7 – CIA loses control of its’ hacking arsenal, information being provided on Wikileaks

WikiLeaks begins its new series of leaks on the U.S. Central Intelligence Agency. Code-named “Vault 7” by WikiLeaks, it is the largest ever publication of confidential documents on the agency.

The first full part of the series, “Year Zero”, comprises 8,761 documents and files from an isolated, high-security network situated inside the CIA’s Center for Cyber Intelligence in Langley, Virgina. It follows an introductory disclosure last month of CIA targeting French political parties and candidates in the lead up to the 2012 presidential election.

Recently, the CIA lost control of the majority of its hacking arsenal including malware, viruses, trojans, weaponized “zero day” exploits, malware remote control systems and associated documentation. This extraordinary collection, which amounts to more than several hundred million lines of code, gives its possessor the entire hacking capacity of the CIA. The archive appears to have been circulated among former U.S. government hackers and contractors in an unauthorized manner, one of whom has provided WikiLeaks with portions of the archive.

“Year Zero” introduces the scope and direction of the CIA’s global covert hacking program, its malware arsenal and dozens of “zero day” weaponized exploits against a wide range of U.S. and European company products, include Apple’s iPhone, Google’s Android and Microsoft’s Windows and even Samsung TVs, which are turned into covert microphones.

Source: Vault7 – Home

One interview question that shows true character

http://www.inc.com/betsy-mikel/1-interview-question-that-cuts-through-the-bs-to-reveal-someones-true-character.html

Are you a giver or a taker? Ask for the names of 4 people the interviewee has boosted their career. If the positions of the people are lower than the interviewee you have a giver. If higher then the interviewee is a taker – a self serving backstabber…

TrackMeNot – run random searches in the background

TrackMeNot runs in Firefox and Chrome as a low-priority background process that periodically issues randomized search-queries to popular search engines, e.g., AOL, Yahoo!, Google, and Bing. It hides users’ actual search trails in a cloud of ‘ghost’ queries, significantly increasing the difficulty of aggregating such data into accurate or identifying user profiles. TMN serves as a means of amplifying users’ discontent with advertising networks that not only disregard privacy, but also facilitate the bulk surveillance agendas of corporate and government agencies, as documented recently in disclosures by Edward Snowden and others. To better simulate user behavior TrackMeNot uses a dynamic query mechanism to ‘evolve’ each client (uniquely) over time, parsing the results of its searches for ‘logical’ future query terms with which to replace those already used.

Source: TrackMeNot

utorrent client comes with litecoin mining and other stuff if you just next the install

Epic’s software tries to do more than mine crypto-currencies, we’re told: like the distributed SETI@home and BOINC projects, it spreads workloads over a large number of home computers, and can use their spare processor cycles to analyze genomes, fold proteins, and so on. But it mostly mines Litecoin.

via Litecoin-mining code found in BitTorrent app, freeloaders hit the roof • The Register.

FSF certify Libreboot X200 laptop

The Free Software Foundation (FSF) has certified another laptop by the UK based supplier The Gnulug. This is the second laptop by the company to get FSF certification.
[…]
They also had to replace Intel’s Management Engine (ME) system and Intel’s Active Management Technology (AMT) firmware which are proprietary.

FSF have previously addresses ME and AMT as back doors into a person’s machine as the computers can be remotely accessed over a network and allows the remotely connected user to power the computer on and off, configure and upgrade the BIOS, wipe hard drives, re-install the OS and more.
via FSF certify Libreboot X200 laptop – Linux Veda.

Scary stuff, laptops being sold with huge backdoors

Twine is an open-source tool for telling interactive, nonlinear stories.

Twine is an open-source tool for telling interactive, nonlinear stories.

You don’t need to write any code to create a simple story with Twine, but you can extend your stories with variables, conditional logic, images, CSS, and JavaScript when you’re ready.

Twine publishes directly to HTML, so you can post your work nearly anywhere. Anything you create with it is completely free to use any way you like, including for commercial purposes.

http://twinery.org/

new role for proteins: assembling amino acids without DNA and RNA

Results from a study published on Jan. 2 in Science defy textbook science, showing for the first time that the building blocks of a protein, called amino acids, can be assembled without blueprints – DNA and an intermediate template called messenger RNA (mRNA). A team of researchers has observed a case in which another protein specifies which amino acids are added.

via Defying textbook science, study finds new role for proteins.

Project un1c0rn – a search engine for (heartbleed, Mysql, Mongodb) vulnerable sites

Think of Project Un1c0rn as a Google for site security. Launched on May 15th, the site’s creators say that so far it has indexed 59,000 websites and counting. The goal, according to its founders, is to document open leaks caused by the Heartbleed bug, as well as “access to users’ databases” in Mongo DB and MySQL.

According to the developers, those three types of vulnerabilities are most widespread because they rely on commonly used tools.[…]

“Billions of people are leaving information and trails in billions of different databases, some just left with default configurations that can be found in a matter of seconds for whoever has the resources,” SweetCorn said. Changing and updating passwords is a crucial practice.

un1c0rn.net.

New report ranks the happiest countries

This year’s report provides country-level happiness rankings and explains changes in national and regional happiness,” said Report editor John Helliwell. Professor Helliwell worked with other CIFAR researchers to analyze data from the Gallup World Poll. “The report reveals important trends and finds six key factors that explain much about national happiness

http://phys.org/news/2013-09-happiest-countries.html