Targeted Password Cracking with AI and Social Media OSINT

This deep-dive is a work in progress that I started a year or two ago and still need to finish.

What if there was a password that you had to crack to save the world? How would you do it?

Your resources:
- government support
- AI scientists and engineers
- cryptologists
- billions of dollars in computing equipment

Username & Password:
- one password "hash"
- username for a login page.

Could you do it before the time runs out?
Stay around to find out.

Loid Forger stopping a missile with one second on the clock. Spy x Family (jolly good show)

AI + OSINT // The Future of Targeted Password Cracking

Everyone is going BONKERS over OpenAI's Generative Pretrained Transformer (GPT) models..mostly ChatGPT💬 (try it out if you haven't - it'll cost money soon😬.) Going over GPT models is out of the scope of this post, but if you want to learn more, then Google it or ask ChatGPT some questions about it because these models will vastly shape the future of cyber as we know it. ‌

So, the REAL question is...‌

CAN AI CRACK LONG PASSWORDS??!!

TL;DR - Yes, but it's not easy to do, script kiddies can't utilize it, and it is only even remotely worth a hacker's time when cracking sets of compromised password hashes rather than performing "targeted" attacks against, let's say, one individual. Targeted attacks will rarely work and are not currently feasible without lots of impossible-to-obtain data, a lot of luck, or resources available only to governments. However, there are models and workflows that I hypothesize which could effectively do this with the available data and technology.

The Basics of Password Cracking

Just some basics to gear you up

Authentication - what you are, what you know, what you have

Authentication in cybersecurity refers to the process of verifying the identity of a user, device, or system before granting access to a network, system, or application. This process is used to ensure that only authorized people or devices have access to certain information or resources. Authentication typically involves a combination of one or more of the following methods: Something the user knows, such as a password, PIN, or passphrase. This is known as knowledge-based authentication.

Something the user has, such as a security token, smart card, or a one-time password. This is known as token-based authentication. Something the user is, such as a fingerprint, face, or voice. This is known as biometric authentication.

Authentication is the first step of the access control process and is typically followed by authorization, which is the process of granting or denying access to specific resources based on the authenticated user's identity and pre-defined access rights. In this case, we will focus on what the user knows as it pertains to passwords

Fundamentals of Password Hashing

Password hashing is simply turning a password into a pseudo-random string called a "hash." Passwords in every application or website are usually stored as hashes and not the actual password...unless they have bad security or don't know how to, and to be fair, the libraries for doing so kind of stink, so usually people gotta pay money to easily engineer this stuff.

When you log in to a website, they are comparing the generated hash of the password from the login page to the user's password hash on their database to know whether you know the password, so they don't know the actual password. Some people refer to this concept as "zero trust." Assume there is an intruder that access to the place where password hashes are stored.

Example of password hashing with SHA 256 - https://emn178.github.io/online-tools/sha256.html

Above you'll see how password hashing sort of works with the most secure and modern algorithm - SHA 256 (secure hashing algorithm using 256 bits/32 bytes.)

Notice that "password" has the same output each time. I say it SORT OF works like this because there's a technique called "salting" which adds in another string of characters on top of the "password" string. Salting complicates things for password cracking or makes it impossible altogether...I'll explain why later.The gist is that password cracking involves password hashes and websites or apps store passwords as password hashes.Let's talk about what password hashes really are and how to crack them.

Password Hashing = Blending Fruit in a Blender

Looking at password hashes for the first time.

Password hashing uses one-way mathematical functions. They have several features or characteristics that define them:‌

Deterministic (same input always gives the same output / same password always generates same hash)
Pre-image resistance / One-way (computationally infeasible to find the password with just the hash)
Collision-resistant
- Weak Collision Resistance (Hard to find an input that makes desired hash value)
- Strong Collision Resistance (Hard to find two inputs with the same hash value)
Performance (easy to compute)
The hash has a set length (usually shorter than the original text)
Pseudorandomness (the output must be statistically random, despite having been produced by a completely deterministic and repeatable process)

We won't go into each of these, but the three significant features related to password cracking are that hashes are the deterministic, one-way, and weak collision resistance components.

🧃Using a fruit juice analogy:

Deterministic - the same fruit always produces the same juice
One-way - you can't get the fruit back from the juice
Weak Collision Resistance - hard to know how much of each fruit makes up the juice‌

For passwords:

Deterministic - the same password always produces the same hash
One-Way - there is no way to go from the hash to the password
Weak Collision Resistance - it is hard to find the password that makes the desired hash

"Brute Force" Cracking = Guessing A LOT‌

Example of brute force password guessing given a password hash

Hackers can obtain a hash through active and direct means from the victim's device or system, but usually, they get the hash from a malicious marketplace like one of the many on the dark web. To crack the password, we just have to keep inputting guesses into the related hashing algorithm (SHA 256) till we find an output that equals the hash we are cracking. If this happens we know we have the password.

In "brute forcing" passwords, one may run into terms like Gigahashes and Terahashes per second. These equate to billions (Giga) of hashes and trillions (Tera) of hashes per second. It's a way to define how many times a second can compute a potential password hash and therefore guess the password. These are often used in the realm of cryptocurrencies as they rely on hashing algorithms too.

Hashes per second won't tell you how fast a computer is at guessing without knowing additional factors such as:

The specific hashing algorithm being used
Aspects of the computer's hardware - how fast it actually computes (usually defined as FLOPS - floating point operations per second).
The application that is being used for hashing

Most hashes can be equated to some number of floating operations. Therefore you can obtain the hashing speed (hashes per second) with FLOPS / the number of floating operations to complete the specified hashing algorithm.

💻 More on FLOPS 💻

Computer vendors and service providers typically list the theoretical peak performance (Rpeak) capabilities of their systems expressed in FLOPS. A system's Rpeak is calculated by multiplying the number of processors by the clock speed of the processors, and then multiplying that product by the number of floating-point operations the processors can perform in one second on standard benchmark programs, such as the LINPACK DP TPP and HPC Challenge (HPCC) benchmarks, and the SPEC integer and floating-point benchmarks.

GigaFLOPS
A 1 gigaFLOPS (GFLOPS) computer system is capable of performing one billion (10⁹) floating-point operations per second. To match what a 1 GFLOPS computer system can do in just one second, you'd have to perform one calculation every second for 31.69 years.

TeraFLOPS
A 1 teraFLOPS (TFLOPS) computer system is capable of performing one trillion (10¹²) floating-point operations per second. The rate 1 TFLOPS is equivalent to 1,000 GFLOPS. To match what a 1 TFLOPS computer system can do in just one second, you'd have to perform one calculation every second for 31,688.77 years.

PetaFLOPS
A 1 petaFLOPS (PFLOPS) computer system is capable of performing one quadrillion (10¹⁵) floating-point operations per second. The rate 1 PFLOPS is equivalent to 1,000 TFLOPS. To match what a 1 PFLOPS computer system can do in just one second, you'd have to perform one calculation every second for 31,688,765 years.

ExaFLOPS
A 1 exaFLOPS (EFLOPS) computer system is capable of performing one quintillion (10¹⁸) floating-point operations per second. The rate 1 EFLOPS is equivalent to 1,000 PFLOPS. To match what a 1 EFLOPS computer system can do in just one second, you'd have to perform one calculation every second for 31,688,765,000 years.

Directly from a great article by my school - Understand measures of supercomputer performance and storage system capacity

There are several ways to determine the guesses per second (or hash rate) for a particular GPU or CPU. One way is to use benchmarking software, such as Hashcat or Aircrack-ng, which can benchmark the performance of a specific device and give you the hash rate in guesses per second. For cryptocurrency mining, there are sites like "whattomine[.]com" that can estimate your hashes per second based on your device. For password cracking speeds, there doesn't seem to be a public curation or website to see the hashing speeds of devices using password-related hashing algorithms.

Salting makes it way harder to guess the password unless the length is extremely short. The below image shows an example of a hacker guessing the user's password when salting is involved. The problem is that the attacker may guess an input that generates the hash, but it's not the real password because the password uses a "salt" which is stored alongside the user on some database. The main idea with salting is that the defenders assume that the attacker somehow obtained the hash, but doesn't know this salt. Public salts also defend against rainbow table attacks. Defense against brute force attacks doesn't work if the attacker obtains the password hash and the salt during an attack. An example of this could be a "man in the browser" attack where they manage to steal the hashed password, but they also somehow obtained the salt from the database where the password is verified. Hopefully, the salt and hash are not stored together on a database either.

Hacking salted cases of passwords is not the focus of this post, but it is important to know that it complicates password cracking. For instance, with just the Hashed Password + Salt (hash), the attacker is likely to find the password, but it isn't guaranteed. For one thing, it will make the number of guesses for a brute-force attack much higher. Dictionary attacks also definitely won't work because salt is not going to be a common phrase, especially when appended to something like "Apple." However, if the password is really short, then the attacker might find "AppleyrtZd" which will tell the attacker that "Apple" must be the password. The real goal of salting is to block rainbow table attacks.

Rainbow Table Attacks

Rainbow tables are a precomputation method with the intent of speeding up the guessing part of the process. To do it, you just start generating hashes of inputs and store them in some type of data construct. This could be something as simple as a table or in most cases a hash table that uses a reduction function. The important piece to understand is that there will always be a double-edged sword or tradeoff between computation (password lookup in the rainbow table) and efficient storage. For instance, a trie (tree-like data structure used for searching in programming) would have quick lookup times, but terrible storage efficiency

Technically how Rainbow Tables are stored - from Wikipedia

The idea behind rainbow tables is that if an attacker can precompute all the possible hash values their matching plaintext and store them in a table, then they can simply look up the hash value of a captured password hash in the table and find the corresponding plaintext password without having to perform a brute-force search. A simple rainbow table (technically not a rainbow table I guess) would look like something below.

Very simple rainbow table idea using MD5 for hashing

The size of rainbow tables can vary depending on the specific implementation and the number of plaintext passwords that are precomputed. In general, rainbow tables can be quite large, often taking up multiple gigabytes or even terabytes of storage space. For example, a rainbow table that contains the hash values of all possible 8-character lowercase alphabetic passwords would be around 3.5 TB. However, the size can be reduced with techniques like time/memory trade-off, which are implemented in most of the rainbow table generators.

“With rainbow tables, you have to calculate the hash of all passwords, but you only store a very small fraction of them, say one in ten thousands. The trick is to organize the passwords and hashes in such an order that with the data you choose to store, you can recreate all the passwords and hashes quickly with the order of ten thousand hash operations. You thus reduce the amount of memory and increase the time of cracking, hence, time-memory trade-off.” - creator of Rainbow Tables

If the password is hashed with the salt, then it is longer and less likely to have been precomputed and put into a rainbow table. The conclusion...make sure your developers are salting the passwords. This defeats precomputation attacks like rainbow tables.

The Limits of Traditional Password Cracking

"Guessing a bunch and guessing stuff off a wordlist isn't scary anymore"

Is password cracking a big problem now?

As mentioned before, password cracking is most commonly utilized against large collections of password hashes that are illegally sold or shared on the dark web. Password reuse is a notable issue in cyberspace at the moment. On average, 60% of credentials are reused across multiple accounts - some organizations have seen that number at 70%. This can be mitigated by using an email aliasing solution along with a password manager, but it does not address the root problem which I will be discussing.

Personal Password Defenses 🔒

Email aliasing - makes it so that a hacker can't retry your passwords on other websites because they aren't attached to the same email. They also mitigate data privacy issues like identity fingerprinting and other unethical user data handling methods and analytics.

Password managers - allow you to only have to memorize one password (your master password into the password manager), which lets you access the storage of all of your other accounts' emails and passwords. What makes them different from a secure notepad is other features like password/passphrase generators which automatically create good entropy/randomization with new passwords, auto-fill, and cross-device compatibility so all of your login stuff is in one place.

MFA - (multi-factor authentication) is, by far, one of the best ways to defend against all sorts of attack vectors. This is equivalent to having to type in a code to verify your email account. It's an extra step so that you're not putting all of your security eggs into the password basket.

For more defenses - Personal Cyber Resilience Arsenal

It doesn't matter if you use different passwords either if they're shorter than 10-12 characters long. Anything shorter makes it low-hanging fruit for any future script kiddies and it becomes lucrative for amateur hackers. Cracked passwords don't tend to make up the entirety of most attacks, but they make or break a large percentage of them or are leveraged in parts of common attack chains. Websites still allow short passwords too, and the length of passwords will need to change as computing gets faster and faster. It's no wonder we have laws like the Quantum Computing Cybersecurity Preparedness Act being put into play. A hacker can do a lot with a password. Quantum computing will one day become more common and we have to assume some nation-states will have this technology in their arsenal. However, quantum computing supposedly doesn't apply to hashing and might not even affect password cracking - haven't researched this much.

Nonetheless, we have to think about how to crack passwords using some of the BLEEDING EDGE TECH, AI, and smart technology we now have at our disposal.

Passwords are definitely a problem and will be for a long time until we make super convenient and innovative methods of user authentication.

Here are stats to support this assertion [ as of late 2022 ]:

Some passwords CANNOT be brute-forced

https://www.security.org/how-secure-is-my-password/

Cracking with Supercomputers?

Brute forcing long passwords takes a long time...unless you are a nation-state. Some of these nation-states can do hundreds of trillions of guesses per second. In fact, I would guess that there may be distributed systems used by the government that can do password cracking on the order of Petahashes/sec or a quadrillion guesses per second. There are now singular computers that can do 1 exaflop per second (doesn't mean guesses per second), and this doesn't even include distributed infrastructures for password cracking either.

1,000,000,000,000,000 <- one quadrilllion😱

- Number of guesses calculator for passwords
- Brute Force Calculator - used below

🤯

Normal computers can do about 100 million - 1 trillion (tops) guesses per second.It's been shown that the government has spent at least millions or hundreds of millions just to crack certain algorithms - (https://weakdh.org/imperfect-forward-secrecy-ccs15.pdf)Though, I'm going to be using INSANE guessing speeds here to make point.

12-character password w/ 1 quadrillion guesses per second = **22 minutes**

13-character password w/ 1 quadrillion guesses per second = **11 hours, 37 minutes**

I'm guessing a nation state can do 100 trillion guesses/sec. This is more realistic...

14-character password w/ nation-state level 100 trillion guesses per second = **4 months, 4 weeks**

My master passphrase into my Bitwarden password manager is 26 characters. Let's see how well that works with 1 QUINTILLION GUESSES PER SECOND.

26-character password w/ 1 quintillion guesses per second = **43521741413930 years, 11 months**

We will see later why even a 26-character password isn't future proof.

Cracking with Distributed/Parallel Computing Setups?

A distributed setup would involve breaking up all of the possible brute force combinations into sections and then sending those jobs to distributed computers for faster cracking. However, this can only improve cracking time in a linear fashion - meaning that 2 computers instead of one only cuts the time in half. When cracking time is on the order of thousands of years, and you only lower that down to a hundred years, then you can start to see why a distributed approach to password cracking has rare use cases. Distributed cracking can be useful when time is extremely critical, and in the case of our save-the-world scenario, maybe linear improvements for cracking are a risk that is worth taking.

This again shows that breaking up the cracking process only has "linear" improvements.

https://asecuritysite.com/public/passwords.pdf

Here's a graph I made to show the relationship with parallel computing and brute-forcing passwords:

So no - we can't brute-force "long" passwords even with a distributed approach, because this is still easily outdone by adding a few more characters to the password. Therefore, saving the world shouldn't completely rely on a parallel brute-forcing approach.

What about with a bigger rainbow table?

We still run into the same issues. It’s possible in theory to make some huge rainbow tables, but this is just as difficult of a problem to solve. We would still have tocrack every known password before putting them into it and design a way of organizing the answers in a very large storage space...VERY LARGE.

Good luck with that, but maybe we should try out some AI or dictionary attacks if we want to save the world.

What about Dictionary Attacks?

Dictionary attacks are going to be your next step to save the world if brute-forcing short characters doesn't work. If you're in cyber, then you know how powerful these can be. These attacks aren't as complicated as AI-based cracking. These attacks follow the assumption that lots of people will use the same passwords as each other. Some devices also will have default passwords on them, and finding that out isn't hard.

Gathering Wordlists

When it comes to dictionary attacks, it's all about gathering wordlists. In fact, many security researchers say that if it has been in a well-known breach before, then don't use it. The hardest part about dictionary attacks is curating, cleaning, and preparing sets of wordlists that the hacker can use against the target's password hash. Since we can guess at ~ 100 million - 1 trillion guesses per second, then going through every possible wordlist one can find isn't that hard considering they usually don't have more than thousands of passwords to try. That's why it's probably the best to use longer passphrases that are easier to remember. Especially, if it's your master password into a password manager. Though, anything easy to remember could one day be cracked with AI 😅.

More "current" wordlists are obtained from illegal marketplaces like ones on the dark web. Hackers sometimes sell plaintext sets of passwords (no need to crack), or they will attempt to uncover new ones from sets of hashes via all of the methods mentioned in this article. Fortunately, there are password checking tools to combat these offline dictionary attack problems. Not to mention, full-scale businesses devoted to it like with https://haveibeenpwned.com/ .

If you want to learn more about this area of password security and the generous work that goes into it, check out Troy Hunt's blog 😁 - Troy Hunt: Troy Hunt.

Where do I get wordlists?

There are ethical and unethical places to find wordlists that are well-known or verified. Someone like Troy Hunt (Troy Hunt: Troy Hunt) probably has a comprehensive list of these sources. However, it's pretty easy to find some of the popular lists.

From the top of the rockyou.txt list...most popular, but overrated dictionary / wordlist

If you're in cyber, then you've heard of rockyou.txt. It was part of the Rockyou breach back in 2009, and that's as far as I'm gonna go. This list is mentioned a lot, because it had a huge impact back then and got a lot of attention.

Some wordlists I used in my project (be careful going to some of these)...and don't be evil 😡:

https://figshare.com/articles/dataset/Yahoo_Password_Frequency_Corpus/2057937
- Sanitized passwords frequency list from Yahoo in May 2011.
https://www.kaggle.com/datasets/wjburns/common-password-list-rockyoutxt
- https://www.tensorflow.org/datasets/catalog/rock_you
- rockyou.txt is one of the most used wordlists for password cracking, which was generated from a 2009 breach of the company “RockYou”.
Wifi Passwords
- https://github.com/berzerk0/Probable-Wordlists/tree/master/Real-Passwords/WPA-Length
- wpa2-wordlists/Wordlists at master · kennyn510/wpa2-wordlists
danielmiessler/SecLists:
- SecLists is the security tester's companion. It's a collection of multiple types of lists used during security assessments, collected in one place. List types include usernames, passwords, URLs, sensitive data patterns, fuzzing payloads, web shells, and many more.
Passwords - SkullSecurity
n0kovo/awesome-password-cracking: A curated list of awesome tools, research, papers and other projects related to password cracking and password security.

For hashes:

https://haveibeenpwned.com/Passwords
- Hundreds of millions of real passwords aggregated from data breaches by

Using wordlists on a hash

The two most popular tools:

John the Ripper is "an Open Source password security auditing and password recovery tool available for many operating systems."
Hashcat is the "World's fastest and most advanced password recovery utility."

Picking the right tool is important here. They are essential to the process and streamlining it.

Important components during dictionary attacks

Creating wordlists that can be utilized with hash cracking applications is more work than one would expect:

Find passwords to curate and aggregate into a final wordlist
- Use various sources
  - Leaked hashes from data breaches
  - Public de-identified (no username or email) datasets
  - Malicious sources from the dark web (do not condone)
Combine the wordlists
- This can take a long time depending on the method
Deduplicate guesses in the wordlist
- This can take a really long time depending on the size of the wordlist
Clean and prepare the text format for guesses in the wordlist
- This will depend on what the password cracking application allows for
- The password hash may have weird characters
Convert to formats to be used with password cracking applications
- Password cracking applications may have certain character encoding that they support (Example: ASCII)

I ran into all of these issues when attempting to crack hashes myself, so it is important to understand that building public wordlists that can be used for research takes a lot of work. This is especially true if your wordlist starts to reach ~50GB. It takes a lot of work and sometimes programming to prepare and clean large files.

Dictionary attacks are not guaranteed

Dictionary attacks are very powerful, especially when hackers want to crack sets compromised hashes that they acquire during attacks. Though, realistically, this will only crack a percentage of the hashes that they are tested on. If we want to save the world, then we have to be sure that we find the real password with only one hash. Using publicly available wordlists won't account for the potential of passwords with words that only a specific user would use.

Wordlist generation is everything

Frank has a dog named Shadow - password is -> Iloveshadow2003

In order to find passwords like this, we have to generate our own wordlists. People use all sorts of patterns to structure or create a password or passphrase. Therefore, we need modular systems that allow us to define how to generate based on patterns that we input.

Permutative & Explicit Wordlist Generation

Traditional methods of generating wordlists

Explicit/Manual Wordlist Generation

This isn't a technical or official term. I've coined this for the sake of defining a category that I've seen with wordlist tools. By explicit or manual I'm referring to the fact that the user has to define instructions for how to build the wordlist somewhat explicitly. This includes defining substitution or transformation patterns or telling a program how to combine certain words. This differs from the AI-based methods (discussed later) because the AI determines the patterns or instructions itself based on training.

Word Mangling & Mangling Rulesets

Simply put, mangling is used to modify or transform a guess from a wordlist. This is often done by applying various rules and operations to the password, such as adding or removing characters, changing the case, or converting characters to numbers. Hashcat and JohnTheRipper both have support mangling as a part of their cracking applications. Both allow users to create custom mangling rules as well.

"password" may be mangled to generate:

PasswordPASSWORDp@sswordP@sswordp@ssWord

Mangling is not really a "permutative" method because it merely mangles a guess and doesn't combine various keywords or inputs.

Permutative Wordlist Generation

Permutative wordlist generation would be a separate aspect of explicit wordlist generation from mangling. Whereas mangling is directly modifying or transforming input keywords or guesses, permutative generation focuses on combining potential keywords together to form guesses for the wordlist.

The Idea of Wordlist Expansion: permutation + mangling

Mangling and permutation are great ways to expand a wordlist. Permutation may e used to create guesses, and then mangling is used to generate different versions of those guesses.

😁

How this is realistically used in a targeted attack is explained later.

AI Models for Targeted Password Cracking

They exist, but they aren't that good yet

They figured out my password. Time to rename him again.

Solution ideas for people to develop

Here are some ideas for people to go to market with or make.

Better password cracking time Guessers or Calculators
- Design an application that can estimate cracking speeds based on hardware, application, the hashing algorithm, and the actual password
- Alternatively, design a web app that converts flops or factors of flops (petaflops for example) into hashes per second based on the algorithm. This can be an estimate too
Rainbow Table Storage Calculator
- Design an app that can guess the amount of space needed for a rainbow table based on various parameters including reduction algorithm, time/memory tradeoff, etc.

Resources & Links

😊 Thank You 😊

https://arstechnica.com/information-technology/2022/05/1-1-quintillion-operations-per-second-us-has-worlds-fastest-supercomputer/

Crack This Password to Save the World!?🤯// Targeted Password Cracking in 2023 // Crack Passwords with AI