Data Science

Minimalist Data Wrangling with Python - Marek Gagolewski

2024
english

 Minimalist Data Wrangling with Python is envisaged as a student's first introduction to data science, providing a high-level overview as well as discussing key concepts in detail. We explore methods for cleaning data gathered from different sources, transforming, selecting, and extracting features, performing exploratory data analysis and dimensionality reduction, identifying naturally occurring data clusters, modelling patterns in data, comparing data between groups, and reporting the results.

Go to >

Foundations of Data Science with Python - John M. Shea

2024
english

 Foundations of Data Science with Python introduces readers to the fundamentals of data science, including data manipulation and visualization, probability, statistics, and dimensionality reduction. This book is targeted toward engineers and scientists, but it should be readily understandable to anyone who knows basic calculus and the essentials of computer programming. It uses a computational-first approach to data science: the reader will learn how to use Python and the associated data-science libraries to visualize, transform, and model data, as well as how to conduct statistical tests using real data sets. Rather than relying on obscure formulas that only apply to very specific statistical tests, this book teaches readers how to perform statistical tests via resampling; this is a simple and general approach to conducting statistical tests using simulations that draw samples from the data being analyzed. The statistical techniques and tools are explained and demonstrated using a diverse collection of data sets to conduct statistical tests related to contemporary topics, from the effects of socioeconomic factors on the spread of the COVID-19 virus to the impact of state laws on firearms mortality.

 This book can be used as an undergraduate textbook for an Introduction to Data Science course or to provide a more contemporary approach in courses like Engineering Statistics. However, it is also intended to be accessible to practicing engineers and scientists who need to gain foundational knowledge of data science.

Key Features:

  • Applies a modern, computational approach to working with data
  • Uses real data sets to conduct statistical tests that address a diverse set of contemporary issues
  • Teaches the fundamentals of some of the most important tools in the Python data-science stack
  • Provides a basic, but rigorous, introduction to Probability and its application to Statistics
  • Offers an accompanying website that provides a unique set of online, interactive tools to help the reader learn the material
Go to >

Mastering Python for Artificial Intelligence - David Ward

2023
english

 Look no further! "Mastering Python for Artificial Intelligence" is your gateway to learning the essential coding skills that will empower you to build cutting-edge AI applications.

 Whether you're a beginner or an experienced programmer, this book will guide you through Python's intricacies and equip you with the knowledge to unleash the true potential of AI.

 Mastering Python for Artificial Intelligence" offers an innovative approach encompassing three well-defined principles, ensuring an empowering learning journey for readers.
 1. Practicality: The book strongly believes in the value of learning by doing. Unlike many other resources, "Mastering Python for Artificial Intelligence" immediately provides the outputs of ALL the examples. Readers won't have to wait to test the code on their computers or wonder if they are on the right track. This practical approach ensures hands-on experience, reinforcing knowledge and boosting confidence.
 2. Simplicity: Learning complex subjects should be approached step by step, and "Mastering Python for Artificial Intelligence" embraces this principle. Each concept is broken down into simple and easily digestible steps. The book aims to make learning efficient and enjoyable, allowing readers to grasp a multitude of topics in the shortest possible time. Clear explanations and examples accompany the content, ensuring rapid progress and understanding.
 3. Synthesis: Recognizing that starting with Python can be overwhelming, this book takes a thoughtful approach. Carefully selected topics provide a comprehensive introduction to Python, offering a solid foundation without overwhelming the reader. By presenting essential concepts in a structured manner, the book ensures broad exposure to Python and its applications in Artificial Intelligence.

 Here's a sneak peek into what you'll discover:

  • Gain a solid understanding of Python's notable features and why it is the preferred language for AI development.
  • Learn the step-by-step process of Python IDE installation, ensuring you have the optimal environment for AI programming.
  • Explore Python programming fundamentals, including variables, statements, operators, and flow control, laying the groundwork for AI development.
  • Dive into the world of data types, such as numeric, sequence, string, list, tuple, set, and dictionary, and understand how they play a crucial role in AI applications.
  • Unleash the potential of Python classes and objects and understand how they form the building blocks of AI models and algorithms.
  • Discover the wealth of Python libraries and frameworks available for AI development, such as TensorFlow, Keras, scikit-learn, and more.
  • Learn how to preprocess data, train AI models, and evaluate their performance using Python's powerful AI libraries.
  • Get hands-on experience with practical coding examples and exercises, allowing you to apply your newfound knowledge and solidify your skills.
  • The SOLUTIONS to the exercises (but be sure to look at them only after first trying to solve the exercises on your own)
  • BONUS: EMPOWERING YOUR LIFE: Harnessing the Power of Chat GPT and Python to Create Your Personal Assistant (scan the QR code inside the book)
  • …and much, much more!
Go to >

Practical Discrete Mathematics - Archana Tikayat Ray, Ryan T. White

2021
english

 Discrete mathematics deals with studying countable, distinct elements, and its principles are widely used in building algorithms for computer science and data science. The knowledge of discrete math concepts will help you understand the algorithms, binary, and general mathematics that sit at the core of data-driven tasks.

 Practical Discrete Mathematics is a comprehensive introduction for those who are new to the mathematics of countable objects. This book will help you get up to speed with using discrete math principles to take your computer science skills to a more advanced level.

 As you learn the language of discrete mathematics, you'll also cover methods crucial to studying and describing computer science and machine learning objects and algorithms. The chapters that follow will guide you through how memory and CPUs work. In addition to this, you'll understand how to analyze data for useful patterns, before finally exploring how to apply math concepts in network routing, web searching, and data science.

 By the end of this book, you'll have a deeper understanding of discrete math and its applications in computer science, and be ready to work on real-world algorithm development and machine learning.

What you will learn

  • Understand the terminology and methods in discrete math and their usage in algorithms and data problems
  • Use Boolean algebra in formal logic and elementary control structures
  • Implement combinatorics to measure computational complexity and manage memory allocation
  • Use random variables, calculate descriptive statistics, and find average-case computational complexity
  • Solve graph problems involved in routing, pathfinding, and graph searches, such as depth-first search
  • Perform ML tasks such as data visualization, regression, and dimensionality reduction

Who this book is for

 This book is for computer scientists looking to expand their knowledge of discrete math, the core topic of their field. University students looking to get hands-on with computer science, mathematics, statistics, engineering, or related disciplines will also find this book useful. Basic Python programming skills and knowledge of elementary real-number algebra are required to get started with this book.

Go to >

Алгоритмический тренинг. Решения практических задач на Python и С++ - М. К. Иванов

2023
C++
русский

 Опираясь на богатый соревновательный и эвристический опыт, автор предлагает оригинальные реализации классических алгоритмов Computer Science на языках Python и C++. Особое внимание уделено математическим и геометрическим алгоритмам, графовым алгоритмам, структурам данных (в особенности различным деревьям), комбинаторике и работе со строками. Книга поможет заложить и расширить алгоритмическую подготовку, познакомит с эффективными решениями вычислительных задач, а для обучающихся станет настольной. Поможет подготовиться к экзаменам, сертификации, олимпиадам по программированию.

Go to >

Pandas в действии - Борис Пасхавер

2023
русский

 Язык Python помогает упростить анализ данных. Если вы научились пользоваться электронными таблицами, то сможете освоить и pandas! Несмотря на сходство с табличной компоновкой Excel, pandas обладает большей гибкостью и более широкими возможностями. Эта библиотека для Python быстро выполняет операции с миллионами строк и способна взаимодействовать с другими инструментами. Она дает идеальную возможность выйти на новый уровень анализа данных.

Go to >

Алгоритмы и структуры для массивных наборов данных - Джейла Меджедович, Эмин Тахирович

2024
русский

Стандартные алгоритмы и структуры при применении к крупным распределенным наборам данных могут становиться медленными — или вообще не работать. Правильный подбор алгоритмов, предназначенных для работы с большими данными, экономит время, повышает точность и снижает стоимость обработки.  Книга знакомит с методами обработки и анализа больших распределенных данных. Насыщенное отраслевыми историями и занимательными иллюстрациями, это удобное руководство позволяет легко понять даже сложные концепции. Вы научитесь применять на реальных примерах такие мощные алгоритмы, как фильтры Блума, набросок count-min, HyperLogLog и LSM-деревья, в своих собственных проектах.

Приведены примеры на Python, R и в псевдокоде.

Основные темы:

  • вероятностные структуры данных в виде набросков;
  • выбор правильного движка базы данных;
  • конструирование эффективных дисковых структур данных и алгоритмов;
  • понимание алгоритмических компромиссов в крупно-масштабных системах;
  • правильное формирование выборок из потоковых данных;
  • вычисление процентилей при ограниченных пространственных ресурсах.
Go to >

Graph Algorithms for Data Science - TOMAŽ BRATANIC

2024
english

 Graphs are the natural way to represent and understand connected data. This book explores the most important algorithms and techniques for graphs in data science, with concrete advice on implementation and deployment. You don’t need any graph experience to start benefiting from this insightful guide. These powerful graph algorithms are explained in clear, jargon-free text and illustrations that makes them easy to apply to your own projects.

In Graph Algorithms for Data Science you will learn:

  • Labeled-property graph modeling
  • Constructing a graph from structured data such as CSV or SQL
  • NLP techniques to construct a graph from unstructured data
  • Cypher query language syntax to manipulate data and extract insights
  • Social network analysis algorithms like PageRank and community detection
  • How to translate graph structure to a ML model input with node embedding models
  • Using graph features in node classification and link prediction workflows


 Graph Algorithms for Data Science is a hands-on guide to working with graph-based data in applications like machine learning, fraud detection, and business data analysis. It’s filled with fascinating and fun projects, demonstrating the ins-and-outs of graphs. You’ll gain practical skills by analyzing Twitter, building graphs with NLP techniques, and much more.

 Foreword by Michael Hunger.

About the technology

 A graph, put simply, is a network of connected data. Graphs are an efficient way to identify and explore the significant relationships naturally occurring within a dataset. This book presents the most important algorithms for graph data science with examples from machine learning, business applications, natural language processing, and more.

About the book

 Graph Algorithms for Data Science shows you how to construct and analyze graphs from structured and unstructured data. In it, you’ll learn to apply graph algorithms like PageRank, community detection/clustering, and knowledge graph models by putting each new algorithm to work in a hands-on data project. This cutting-edge book also demonstrates how you can create graphs that optimize input for AI models using node embedding.

What's inside

  • Creating knowledge graphs
  • Node classification and link prediction workflows
  • NLP techniques for graph construction

About the reader

 For data scientists who know machine learning basics. Examples use the Cypher query language, which is explained in the book.

Go to >

The Pragmatic Programmer for Machine Learning - Marco Scutari, Mauro Malvestio

2023
english

 Machine learning has redefined the way we work with data and is increasingly becoming an indispensable part of everyday life. The Pragmatic Programmer for Machine Learning: Engineering Analytics and Data Science Solutions discusses how modern software engineering practices are part of this revolution both conceptually and in practical applictions.

 Comprising a broad overview of how to design machine learning pipelines as well as the state-of-the-art tools we use to make them, this book provides a multi-disciplinary view of how traditional software engineering can be adapted to and integrated with the workflows of domain experts and probabilistic models.

 From choosing the right hardware to designing effective pipelines architectures and adopting software development best practices, this guide will appeal to machine learning and data science specialists, whilst also laying out key high-level principlesin a way that is approachable for students of computer science and aspiring programmers.

Go to >

Big Data Management and Analytics - Brij B Gupta, Mamta

2024
english

 With the proliferation of information, big data management and analysis have become an indispensable part of any system to handle such amounts of data. The amount of data generated by the multitude of interconnected devices increases exponentially, making the storage and processing of these data a real challenge.

 Big data management and analytics have gained momentum in almost every industry, ranging from finance or healthcare. Big data can reveal key insights if handled and analyzed properly; it has great application potential to improve the working of any industry. This book covers the spectrum aspects of big data; from the preliminary level to specific case studies. It will help readers gain knowledge of the big data landscape.

 Highlights of the topics covered include description of the Big Data ecosystem; real-world instances of big data issues; how the Vs of Big Data (volume, velocity, variety, veracity, valence, and value) affect data collection, monitoring, storage, analysis, and reporting; structural process to get value out of Big Data and recognize the differences between a standard database management system and a big data management system.

 Readers will gain insights into choice of data models, data extraction, data integration to solve large data problems, data modelling using machine learning techniques, Spark's scalable machine learning techniques, modeling a big data problem into a graph database and performing scalable analytical operations over the graph and different tools and techniques for processing big data and its applications including in healthcare and finance.

Go to >

Python Data Science: Learn Python in a Week and Master It - Computer Programming Academy

2020
english

 Would you like to learn to use Python extracting meaningful insight from data to grow your business but you reckon it will be too complex? Or perhaps you want to know how to analyze data to solve simple domestic issues but you don't know how to do it?

 Here's the deal... As a beginner you will be probably afraid that programming is difficult... Learning data analysis and data mining can take months, and the possibility to give up before mastering them could be high. So, if you have a project to develop you could think on hiring a professional analyst to shorten the time. This may seem like a good solution but it is certainly very expensive and if the analyst you chose doesn't perform a proper job you still have to pay for it.

 The best solution is a complete programming manual with hands-on projects and practical exercises. Computer Programming Academy structured this guide as a course with seven chapters for seven days and studied special exercises for each section to apply what you learned step-by-step. This protocol, tested on both total beginners and people who were already familiar with coding, takes advantage of the principle of diving, concentrating learning in one week. The result of this method has been one for both categories of students: the content of the course was learned faster and remembered longer respect the average.

 Inside this book, you will go through a first section in which fundamental and basic notions of data science are discussed, to get to the next chapters crafted specifically to help you learn all the advance data analysis concepts required to produce valuable outcomes from a large volume of data.

In the detail, you will learn:

  • The importance of data science technologies in our daily lives
  • What are the most common mistakes to avoid when you start dealing with Python for data science
  • The 5 stages of the data science lifecycle at the basis of most used applications
  • 3 important actions required to gain insights from big data
  • What are the advantages of the data mining process in resolving real world problems
  • The data analysis tools that will make your life easier
  • 3 key frameworks that you have to know to transform unstructured and unorganized data in significant insight
  • How to download and use the main Python based data analysis libraries
  • A simple method to implement predictive analytics to resolve a business issue in less than 7 days
  • A proven strategy to develop predictive models to analyze customers' behavior
  • Exercises and quizzes at the end of every chapter to review immediately what you've learned
  • Extra content that you will appreciate as curious technology enthusiast

Why is this book different for?

 Most of the books on the market only take a brief look into data science, showing some of the topics but never going deep concretely. The best way to learn data analysis and data mining is by doing and with this manual you will work through applicable projects in order to solidify your knowledge and obtain a huge sense of achievement.

 This is what this guide offers to you, even if you're completely new to programming in 2020 or you are looking to widen your skills as programmer.

Go to >

SQL для анализа данных - Кэти Танимура

2024
русский

 Рассказывается о возможностях SQL применительно к анализу данных. Сравниваются различные типы баз данных, описаны методы подготовки данных для анализа. Рассказано о типах данных, структуре SQL-запросов, профилировании, структурировании и очистке данных. Описаны методы анализа временных рядов, трендов, приведены примеры анализа данных с учётом сезонности. Отдельные главы посвящены когортному анализу, текстовому анализу, выявлению и обработке аномалий, анализу результатов экспериментов и А/В-тестирования. Описано создание сложных наборов данных, комбинирование методов анализа. Приведены практические примеры анализа воронки продаж и потребительской корзины.

Go to >

Data Science для карьериста - Нолис Жаклин, Робинсон Эмили

2021
русский

 Все мы хотим построить успешную карьеру. Как найти ключ к долгосрочному успеху в Data Science? Для этого понадобятся не только технические ноу-хау, но и правильные "мягкие навыки". Лишь объединив оба этих компонента, можно стать востребованным специалистом.

 Узнайте, как получить первую работу в Data Science и превратиться в ценного сотрудника высокого уровня! Четкие и простые инструкции научат вас составлять потрясающие резюме и легко проходить самые сложные интервью.

 Data Science стремительно меняется, поэтому поддерживать стабильную работу проектов, адаптировать их к потребностям компании и работать со сложными стейкхолдерами не так уж и легко. Опытные дата-сайентисты делятся идеями, которые помогут реализовать ваши ожидания, справиться с неудачами и спланировать карьерный путь.

Go to >

Practical Data Science with R. 2 Ed - John Mount, Nina Zumel

2020
english

 Practical Data Science with R, Second Edition takes a practice-oriented approach to explaining basic principles in the ever expanding field of data science. You’ll jump right to real-world use cases as you apply the R programming language and statistical analysis techniques to carefully explained examples based in marketing, business intelligence, and decision support.

 Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the technology

 Evidence-based decisions are crucial to success. Applying the right data analysis techniques to your carefully curated business data helps you make accurate predictions, identify trends, and spot trouble in advance. The R data analysis platform provides the tools you need to tackle day-to-day data analysis and machine learning tasks efficiently and effectively.

About the book

 Practical Data Science with R, Second Edition is a task-based tutorial that leads readers through dozens of useful, data analysis practices using the R language. By concentrating on the most important tasks you’ll face on the job, this friendly guide is comfortable both for business analysts and data scientists. Because data is only useful if it can be understood, you’ll also find fantastic tips for organizing and presenting data in tables, as well as snappy visualizations.

What's inside

  • Statistical analysis for business pros
  • Effective data presentation
  • The most useful R tools
  • Interpreting complicated predictive models

About the reader

 You’ll need to be comfortable with basic statistics and have an introductory knowledge of R or another high-level programming language.

Go to >

Data Science: Инсайдерская информация для новичков - Кэти О’Нил, Рэйчел Шатт

2019
русский

 "Data Science (исследование данных) – одна из самых востребованных специализаций нашего времени. Изучение данных позволяет преобразить любую традиционную или инновационную бизнес-модель. Эта книга основана на вводном курсе по Data Science из Колумбийского университета, и начинающему специалисту-аналитику она совершенно необходима. Эта книга увлекательно и доступно рассказывает о Байесовском методе Статистических алгоритмах Финансовом моделировании Рекомендательных движках Визуализации данных MapReduce с примерами на языках Python и R".

Go to >

Теоретический минимум по Big Data - Ын Анналин, Су Кеннет

2019
русский

 Cегодня Big Data – это большой бизнес.

 Нашей жизнью управляет информация, и извлечение выгоды из нее становится центральным моментом в работе современных организаций. Не важно кто вы – деловой человек, работающий с аналитикой, начинающий программист или разработчик, – "Теоретический минимум по Big Data" позволит разобраться в основах новой и стремительно развивающейся отрасли обработки больших данных.

 Хотите узнать о больших данных и механизмах работы с ними? Каждому алгоритму посвящена отдельная глава, в которой не только объясняются основные принципы работы, но и даются примеры использования в реальных задачах. Большое количество иллюстраций и простые комментарии позволят легко разобраться в самых сложных аспектах Big Data.

 "Отличная визуализация концепций машинного обучения позволяет "нетехнарям" интуитивно понять сложные абстрактные понятия. Это лаконичная и точная выжимка содержит теоретический минимум информации, необходимый для первого знакомства с Big Data."

Go to >

Python for Data Science For Dummies. 3 Ed. - John Paul Mueller, Luca Massaron

2024
english

Python for Data Science For Dummies lets you get your hands dirty with data using one of the top programming languages. This beginner’s guide takes you step by step through getting started, performing data analysis, understanding datasets and example code, working with Google Colab, sampling data, and beyond. Coding your data analysis tasks will make your life easier, make you more in-demand as an employee, and open the door to valuable knowledge and insights. This new edition is updated for the latest version of Python and includes current, relevant data examples.

  • Get a firm background in the basics of Python coding for data analysis
  • Learn about data science careers you can pursue with Python coding skills
  • Integrate data analysis with multimedia and graphics
  • Manage and organize data with cloud-based relational databases

Python careers are on the rise. Grab this user-friendly Dummies guide and gain the programming skills you need to become a data pro.

Go to >

Advanced Data Analytics Using Python. 2 Ed - Sayan Mukhopadhyay, Pratip Samanta

2023
english

 Understand advanced data analytics concepts such as time series and principal component analysis with ETL, supervised learning, and PySpark using Python. This book covers architectural patterns in data analytics, text and image classification, optimization techniques, natural language processing, and computer vision in the cloud environment.


 Generic design patterns in Python programming is clearly explained, emphasizing architectural practices such as hot potato anti-patterns. You'll review recent advances in databases such as Neo4j, Elasticsearch, and MongoDB. You'll then study feature engineering in images and texts with implementing business logic and see how to build machine learning and deep learning models using transfer learning.

 Advanced Analytics with Python, 2nd edition features a chapter on clustering with a neural network, regularization techniques, and algorithmic design patterns in data analytics with reinforcement learning. Finally, the recommender system in PySpark explains how to optimize models for a specific application.

What You'll Learn

  • Build intelligent systems for enterprise
  • Review time series analysis, classifications, regression, and clustering
  • Explore supervised learning, unsupervised learning, reinforcement learning, and transfer learning
  • Use cloud platforms like GCP and AWS in data analytics
  • Understand Covers design patterns in Python

Who This Book Is For

 Data scientists and software developers interested in the field of data analytics.

Go to >

Data Science в действии - Леонард Апельцин

2023
русский

 В проектах обработки и анализа данных много движущихся частей, и требуются практика и знания, чтобы создать гармоничную комбинацию кода, алгоритмов, наборов данных, форматов и визуальных представлений. Эта уникальная книга содержит описание пяти практических проектов, включая отслеживание вспышек заболеваний по заголовкам новостей, анализ социальных сетей и поиск закономерностей в данных о переходах по рекламным объявлениям.

 Автор не ограничивается поверхностным обсуждением теории и искусственными примерами. Исследуя представленные проекты, вы узнаете, как устранять распространенные проблемы, такие как отсутствующие и искаженные данные и алгоритмы, не соответствующие создаваемой модели. По достоинству оцените подробные инструкции по настройке и детальные обсуждения решений, в которых описываются типичные точки отказа, и обретите уверенность в своих навыках.

Go to >

Practical Linear Algebra for Data Science - Mike X. Cohen

2022
english

 If you want to work in any computational or technical field, you need to understand linear algebra. As the study of matrices and operations acting upon them, linear algebra is the mathematical basis of nearly all algorithms and analyses implemented in computers. But the way it's presented in decades-old textbooks is much different from how professionals use linear algebra today to solve real-world modern applications.

 This practical guide from Mike X Cohen teaches the core concepts of linear algebra as implemented in Python, including how they're used in data science, machine learning, deep learning, computational simulations, and biomedical data processing applications. Armed with knowledge from this book, you'll be able to understand, implement, and adapt myriad modern analysis methods and algorithms.

 Ideal for practitioners and students using computer technology and algorithms, this book introduces you to:

  • The interpretations and applications of vectors and matrices
  • Matrix arithmetic (various multiplications and transformations)
  • Independence, rank, and inverses
  • Important decompositions used in applied linear algebra (including LU and QR)
  • Eigendecomposition and singular value decomposition
  • Applications including least-squares model fitting and principal components analysis
Go to >

Прикладная линейная алгебра для исследователей данных - Коэн Майк Икс

2023
русский

 Линейная алгебра, предметом которой являются матрицы и операции на них, составляет математическую основу почти всех алгоритмов и методов анализа, реализованных в компьютерах. Но в учебниках десятилетней давности она преподносится без учета того, как профессионалы применяют линейную алгебру сегодня для решения реальных задач.

 В книге рассказывается о ключевых концепциях линейной алгебры, реализованных на Python, и о том, как их использовать в науке о данных, машинном и глубоком обучении и вычислительном моделировании. Вооружившись этими знаниями, вы сможете понять, как внедрять и адаптировать под свои задачи целый ряд современных методов анализа и алгоритмов.

 Книга идеально подходит специалистам по обработке данных, а также будет полезна студентам и широкому кругу разработчиков ПО.

Go to >

Essential Math for Data Science - Thomas Nield

2022
english

Master the math needed to excel in data science, machine learning, and statistics. In this book author Thomas Nield guides you through areas like calculus, probability, linear algebra, and statistics and how they apply to techniques like linear regression, logistic regression, and neural networks. Along the way you'll also gain practical insights into the state of data science and how to use those insights to maximize your career.

Learn how to:

  • Use Python code and libraries like SymPy, NumPy, and scikit-learn to explore essential mathematical concepts like calculus, linear algebra, statistics, and machine learning
  • Understand techniques like linear regression, logistic regression, and neural networks in plain English, with minimal mathematical notation and jargon
  • Perform descriptive statistics and hypothesis testing on a dataset to interpret p-values and statistical significance
  • Manipulate vectors and matrices and perform matrix decomposition
  • Integrate and build upon incremental knowledge of calculus, probability, statistics, and linear algebra, and apply it to regression models including neural networks
  • Navigate practically through a data science career and avoid common pitfalls, assumptions, and biases while tuning your skill set to stand out in the job market
Go to >

SQL for Data Science - Antonio Badia

2020
english

 This textbook explains SQL within the contextof data science and introduces the different parts of SQL as they are needed for the tasks usually carried out during data analysis. Using the framework of the data life cycle, it focuses on the steps that are very often given the short shift in traditional textbooks, like data loading, cleaning and pre-processing.

 The book is organized as follows. Chapter 1 describes the data life cycle, i.e. the sequence of stages from data acquisition to archiving, that data goes through as it is prepared and then actually analyzed, together with the different activities that take place at each stage. Chapter 2 gets into databases proper, explaining how relational databases organize data. Non-traditional data, like XML and text, are also covered. Chapter 3 introduces SQL queries, but unlike traditional textbooks, queries and their parts are described around typical data analysis tasks like data exploration, cleaning and transformation. Chapter 4 introduces some basic techniques for data analysis and shows how SQL can be used for some simple analyses without too much complication. Chapter 5 introduces additional SQL constructs that are important in a variety of situations and thus completes the coverage of SQL queries. Lastly, chapter 6 briefly explains how to use SQL from within R and from within Python programs. It focuses on how these languages can interact with a database, and how what has been learned about SQL can be leveraged to make life easier when using R or Python. All chapters contain a lot of examples and exercises on the way, and readers are encouraged to install the two open-source database systems (MySQL and Postgres) that are used throughout the book in order to practice and work on the exercises, because simply reading the book is much less useful than actually usingit.

 This book is for anyone interested in data science and/or databases. It just demands a bit of computer fluency, but no specific background on databases or data analysis. All concepts are introduced intuitively and with a minimum of specialized jargon. After going through this book, readers should be able to profitably learn more about data mining, machine learning, and database management from more advanced textbooks and courses.

Go to >

Data Science from Scratch. 2 Ed - Joel Grus

2019
english

 To really learn data science, you should not only master the tools—data science libraries, frameworks, modules, and toolkits—but also understand the ideas and principles underlying them. Updated for Python 3.6, this second edition of Data Science from Scratch shows you how these tools and algorithms work by implementing them from scratch.

 If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with the hacking skills you need to get started as a data scientist. Packed with New material on deep learning, statistics, and natural language processing, this updated book shows you how to find the gems in today’s messy glut of data.

  • Get a crash course in Python
  • Learn the basics of linear algebra, statistics, and probability—and how and when they’re used in data science
  • Collect, explore, clean, munge, and manipulate data
  • Dive into the fundamentals of machine learning
  • Implement models such as k-nearest neighbors, Naïve Bayes, linear and logistic regression, decision trees, neural networks, and clustering
  • Explore recommender systems, natural language processing, network analysis, MapReduce, and databases.
Go to >
< 1 2 >