data collection aspect of our project

now as you know that when you are doing

supervised learning you need a lot of

data truth data is basically you have an

image which you can call it as X or an

independent variable and then you have a

label a class label which will tell you

whether these images of miracoli Maria

Sharapova so that label is called Y or a

target variable you need lot of these

images to train our classifier now how

can you collect all these images? There

are four different ways of collecting

this number one is very obvious you go

to Google Images start downloading these

images one by one now this approach is

very manual and labor intensive for our

project it works we need probably fifty

or more images so it's not that bad but

if you are training a huge algorithm

with so much data then manually

downloading these images might be very

tough so the second approach is you can

use Python and automatic test software

along with chrome driver I found this

nice article on towards data science

which talks about how you can scrap

images from Google and how you can

download them in automated way now I'm

going to provide a link of the code I

don't want to discuss this code in

detail for one specific reason the

reason is that the code is working today

I'm not sure if it's going to work after

one week and this is because Google is

improving their algorithms and they are

trying the best so that people do not

scrap their website web scraping in

general is a very gray area in terms of

legal implication google images are a

public information so on one hand you

would think that why can't I scrape it

but if you are Google then you don't

want people to be writing this automated

boards which can scrap the content

Google in fact had this issue when Binga

came out being started escaping a lot of

Google articles

to improve their own search performance

and Google of course did not like it any

website would not like the fact that you

are a web scraping their content and

they will try their best to stop you

from writing this web crawler so you

have to be very very mindful when you

are using any automated way of scrapping

Internet just keep in mind that your

core might just stop working one fine

day and you have to continuously improve

or continuously change that now. In this

short essay you already noticed that when

we are running this program or to scrap

Google it is opening a chrome or window

on its own and it is trying to download

the images one by one and at the top you

notice this that chrome is controlled by

automated testing software now if you

know about selenium selenium is an

automated testing software which will

simulate manual actions so it is like a

computer going and clicking different

things and right-clicking and

downloading it so it is that paradigm

there is RPO also which is called

robotic process automation which can be

used for automating this kind of manual

tasks third way which I kind of suggest

is probably a better way is to use

Chrome extension called fat cool so the

fat kun is something you can add it to

your Chrome easily and after that you

can open a tab of Google Images and you

can just say download and it will

download all the images you can also

filter based on width and height I want

to say thank you to my dear friend Kenji

who is a head of data scientist he also

runs a YouTube channel for data science

so I will provide a link of his YouTube

channel you can go check it out he's

doing some awesome work so thank you can

for suggesting me this fat control also

I want to thank you my dear friend a big

duty Paul who is a data scientist at

Amazon he has helped me throughout the


the fourth way of collecting images is

to buy these images if you are working

for a big company they will have a

budget and

using that money they can even buy the

images from some news website you know

let's say you are a CNN or NDT you are

Times of India these are news portals

the companies will have a lot of images

okay and there could be another

third-party vendors who might be selling

those images so the fourth option which

is by paying a price you can buy these

images now if you are less the Times of

India or CNN yourself then your company

will have a team engineering team who

will have access to these images so you

can contact that team and you can get

those images from the team and it store

it in your data warehouse for your

convenience I have a given link below

where you can just download all these

images so if you don't want to bother

about all this you can.

 I have this

relevant data set that you can use but

if you want to play with this Google

image capping code then I have a link

github link in the article description

below also try this fat cone

tool in the next article we are going to

talk about data cleaning and feature