Image data collection services - what are they?
Do you work in an environment where image data collection is important? These types of services can be difficult to find. Well, in this article we are going to talk about a
data collection aspect of our project
now as you know that when you are doing
supervised learning you need a lot of
data truth data is basically you have an
image which you can call it as X or an
independent variable and then you have a
label a class label which will tell you
whether these images of miracoli Maria
Sharapova so that label is called Y or a
target variable you need lot of these
images to train our classifier now how
can you collect all these images? There
are four different ways of collecting
this number one is very obvious you go
to Google Images start downloading these
images one by one now this approach is
very manual and labor intensive for our
project it works we need probably fifty
or more images so it's not that bad but
if you are training a huge algorithm
with so much data then manually
downloading these images might be very
tough so the second approach is you can
use Python and automatic test software
along with chrome driver I found this
nice article on towards data science
which talks about how you can scrap
images from Google and how you can
download them in automated way now I'm
going to provide a link of the code I
don't want to discuss this code in
detail for one specific reason the
reason is that the code is working today
I'm not sure if it's going to work after
one week and this is because Google is
improving their algorithms and they are
trying the best so that people do not
scrap their website web scraping in
general is a very gray area in terms of
legal implication google images are a
public information so on one hand you
would think that why can't I scrape it
but if you are Google then you don't
want people to be writing this automated
boards which can scrap the content
Google in fact had this issue when Binga
came out being started escaping a lot of
Google articles
to improve their own search performance
and Google of course did not like it any
website would not like the fact that you
are a web scraping their content and
they will try their best to stop you
from writing this web crawler so you
have to be very very mindful when you
are using any automated way of scrapping
Internet just keep in mind that your
core might just stop working one fine
day and you have to continuously improve
or continuously change that now. In this
short essay you already noticed that when
we are running this program or to scrap
Google it is opening a chrome or window
on its own and it is trying to download
the images one by one and at the top you
notice this that chrome is controlled by
automated testing software now if you
know about selenium selenium is an
automated testing software which will
simulate manual actions so it is like a
computer going and clicking different
things and right-clicking and
downloading it so it is that paradigm
there is RPO also which is called
robotic process automation which can be
used for automating this kind of manual
tasks third way which I kind of suggest
is probably a better way is to use
Chrome extension called fat cool so the
fat kun is something you can add it to
your Chrome easily and after that you
can open a tab of Google Images and you
can just say download and it will
download all the images you can also
filter based on width and height I want
to say thank you to my dear friend Kenji
who is a head of data scientist he also
runs a YouTube channel for data science
so I will provide a link of his YouTube
channel you can go check it out he's
doing some awesome work so thank you can
for suggesting me this fat control also
I want to thank you my dear friend a big
duty Paul who is a data scientist at
Amazon he has helped me throughout the
project
the fourth way of collecting images is
to buy these images if you are working
for a big company they will have a
budget and
using that money they can even buy the
images from some news website you know
let's say you are a CNN or NDT you are
Times of India these are news portals
the companies will have a lot of images
okay and there could be another
third-party vendors who might be selling
those images so the fourth option which
is by paying a price you can buy these
images now if you are less the Times of
India or CNN yourself then your company
will have a team engineering team who
will have access to these images so you
can contact that team and you can get
those images from the team and it store
it in your data warehouse for your
convenience I have a given link below
where you can just download all these
images so if you don't want to bother
about all this you can.
So what are they? Well, I hope you now know more about image information gathering offers that present to you organized data, and services which make your life easier.
I have this
relevant data set that you can use but
if you want to play with this Google
image capping code then I have a link
github link in the article description
below also try this fat cone
tool in the next article we are going to
talk about data cleaning and feature
engineering.