Tamil 1k Tweets For Binary Sentiment Analysis

To find a labeled data for Tamil NLP task is a difficult task. Some papers talk about Tamil Neural Translation, but the article doesn’t release code. If you’re working part-time or possess an interest in Tamil NLP, you have a tough time finding data. When I was looking for labeled data for simple sentiment analysis, I couldn’t find any. It’s understandable because there is no one working on it. So I decided to build my dataset. [Read More]

Parameterize Python Tests

Introduction A single test case follows a pattern. Setup the data, invoke the function or method with the arguments and assert the return data or the state changes. A function will have a minimum of one or more test cases for various success and failure cases. Here is an example implementation of wc command for a single file that returns number of words, lines, and characters for an ASCII text file. [Read More]

Incomplete data is useless - COVID-19 India data

The data is a representation of reality. When a value is missing in the piece of data, it makes it less useful and reliable. Every day, articles, a news report about COVID-19 discuss the new cases, recovered cases, and deceased cases. This information gives you a sense of hope or reality or confusion. Regarding COVID-19, everyone believes or accepts specific details as fact like mortality rate is 2 to 3 percent, over the age of fifty, the chance of death is 30 to 50 percent. [Read More]

“Don’t touch your face” - Neural Network will warn you

A few days back, Keras creator, Francois Chollet tweeted A Keras/TF.js challenge: make a model that processes a webcam feed and detects when someone touches their face (triggering a loud beep).“ The very next day, I tried the Keras yolov3 model available in the Github. It was trained on the coco80 dataset and could detect person but not the face touch. V1 Training The original Keras implementation lacked documentation to train fromscratch or transfer learning. [Read More]

1000 more whitelist sites in Kashmir, yet no Internet

Kashmir is under lockdown for more than 200,200,200,200,200,200,200,200,200,200, 200,200,200,200,200,200,200,200,200,200, 200,200,200,200,200,200,200,200,200,200, 200,200,200,200,200,200,200,200,200,200, 200,200,200,200,200,200,200,200,200,200, 200,200,200,200,200,200,200,200,200,200, 200,200,200,200,200,200,200,200,200,200, 200,200,200,200,200,200,200,200,200,200, 200,200,200,200,200,200,200,200,200,200, 200,200,200,200,200,200,200,200,200,200, 200,200,200,200,200,200,200,200,200,200, 200,200,200,200,200,200,200,200,200,200, 200,200,200,200,200,200,200,200,200,200, 200,200,200,200,200,200,200,200,200,200, 200,200,200,200,200,200,200,200,200,200, 200,200,200,200,200,200,200,200,200,200, 200,200,200,200,200,200,200,200,200,200, 200,200,200,200,200,200,200,200,200,200, 200,200,200,200,200,200,200,200,200,200, 200,200,200,200,200,200,200,200,200,200, 200 days. Last Friday(14th Feb, 2020), Government released a documentwith list of allowed whitelist sites at 2G speed. The text file with URLs extracted from the PDF - https://gitlab.com/snippets/1943725 In case you’re not interested in tech details, analysis, or short of time, look at the summary section at the bottom. [Read More]

Capture all browser HTTP[s] calls to load a web page

How does one find out what network calls, browser requests to load web pages? The simple method - download the HTML page, parse the page, find out all the network calls using web parsers like beautifulsoup. The shortcoming in the method, what about the network calls made by your browser before requesting the web page? For example, firefox makes a call to ocsp.digicert.com to obtain revocation status on digital certificates. The protocol is Online Certificate Status Protocol. [Read More]
python  proxy  HTTP 

153 sites allowed in Kashmir but no internet

Kashmir is locked down without the internet for more than 167 days as of 19th Jan 2020 since 5th Aug 2019. The wire recently published an article wherein the Government of India whitelisted 153 websites access in Kashmir. Below is the list extracted from the document . The internet shutdown is becoming common in recent days during protests. Anyone with little knowledge to create a web application or work can say, every web application will make network calls to other sites to load JavaScript, Style sheets, Maps, Videos, Images, etc. [Read More]

How long do Python Postgres tools take to load data?

Data is crucial for all applications. While fetching a significant amount of data from database multiple times, faster data load times improve performance. The post considers tools like SQLAlchemy statement, SQLAlchemy ORM, Pscopg2, psql for measuring latency. And to measure the python tool timing, jupyter notebook’s timeit is used. Psql is for the lowest time taken reference. Table Structure annotation=> \d data; Table "public.data" Column | Type | Modifiers --------+-----------+--------------------------------------------------- id | integer | not null default nextval('data_id_seq'::regclass) value | integer | label | integer | x | integer[] | y | integer[] | Indexes: "data_pkey" PRIMARY KEY, btree (id) "ix_data_label" btree (label) annotation=> select count(*) from data; count --------- 1050475 (1 row) SQLAlchemy ORM Declaration class Data(Base): __tablename__ = 'data' id = Column(Integer, primary_key=True) value = Column(Integer) # 0 => Training, 1 => test label = Column(Integer, default=0, index=True) x = Column(postgresql. [Read More]

Debugging Python multiprocessing program with strace

Debugging is a time consuming and brain draining process. It’s essential part of learning and writing maintainable code. Every person has their way of debugging, approaches and tools. Sometimes you can view the traceback, pull the code from memory, and find a quick fix. Some other times, you opt different tricks like the print statement, debugger, and rubber duck method. Debugging multi-processing bug in Python is hard because of various reasons. [Read More]

Notes from Root Conf Day 2 - 2017

On day 2, I spent a considerable amount of time networking and attend only four sessions. Spotswap: running production APIs on Spot instance Amazon EC2 spot instances are cheaper than on-demand server costs. Spot instances run when the bid price is greater than market/spot instance price. Mapbox API server uses spot instances which are part of auto-scaling server Auto scaling group is configured with min, desired, max parameters. Latency should be low and cost effective EC2 has three types of instances: On demand, reserved and spot. [Read More]