Malware Data Science
Attack Detection and Attribution
Saxe and Sanders apply machine-learning techniques (classification, clustering, deep learning) to malware detection and attribution, with working Python code and real corpora.
As an Amazon Associate we earn from qualifying purchases. The link above is sponsored.
- Authors
- Joshua Saxe,Hillary Sanders
- Published
- 2018
- Publisher
- No Starch Press
- Pages
- 272
- Language
- English
Read this if
Malware analysts and detection engineers who want to scale beyond manual triage. Saxe and Sanders apply classification, clustering, similarity analysis, and deep learning to the malware corpus, with working Python code throughout.
Skip this if
Analysts whose work is one-sample-at-a-time, or readers without basic Python and statistics comfort. The book is for telemetry-rich environments where ML scales matter.
Key takeaways
- Static-feature classifiers can route a triage queue effectively even at scale; the book's chapters on feature engineering pay back the cost.
- Similarity analysis (locality-sensitive hashing, ssdeep, imphash, function-level fuzzy hashing) is the analyst's lever for clustering campaigns and tracking actor evolution.
- Deep learning is overhyped for malware in many contexts and exactly the right tool in others; the book is honest about the trade-offs in a way most ML/security books aren't.
Notes
Pair with Practical Malware Analysis (Sikorski/Honig) for the manual-analysis foundation and with Joshua Saxe's later work at Sophos for the production deployment view. Code samples on GitHub make this rare among security books in still being runnable years later. Useful for anyone designing detection at scale, less useful for boutique malware analysts.
What to read before
What to read before Malware Data Science →Beginner · 2014
Countdown to Zero Day
Kim Zetter's investigative reconstruction of Stuxnet, the joint US/Israeli operation that physically damaged Iranian uranium-enrichment centrifuges via a worm, and what its discovery revealed about state-level cyber capability.
Intermediate · 2012
Practical Malware Analysis
Still the gold standard textbook for static and dynamic malware analysis on Windows.
Intermediate · 2013
The Practice of Network Security Monitoring
Richard Bejtlich's NSM playbook: how to deploy collection sensors, validate that you actually see what you think you see, and build detection workflows around open-source tools.
What to read next
What to read after Malware Data Science →Advanced · 2009
Les virus informatiques : théorie, pratique et applications
Éric Filiol's reference French-language treatment of computer virology. Formal theory, infection mechanisms, offensive and defensive applications, with academic rigor rare on the topic.
Advanced · 2014
The Art of Memory Forensics
Ligh, Case, Levy, and Walters' canonical reference on memory analysis with Volatility — the technique, the tooling, and the operating-system internals it depends on, across Windows, Linux, and macOS.
Advanced · 2024
Evasive Malware
Kyle Cucci on the anti-analysis arms race: sandbox detection, anti-debug, anti-VM, packing, and the analyst-side tooling and tradecraft that get past those layers.
Explore similar books
Alternatives to Malware Data Science →Intermediate · 2013
The Practice of Network Security Monitoring
Richard Bejtlich's NSM playbook: how to deploy collection sensors, validate that you actually see what you think you see, and build detection workflows around open-source tools.
Intermediate · 2012
Practical Malware Analysis
Still the gold standard textbook for static and dynamic malware analysis on Windows.
Intermediate · 2017
Network Security Through Data Analysis
Michael Collins on building situational awareness from network telemetry: collection architecture, statistical baseline-setting, and the analytic patterns that turn raw flows into detection.