Large Tech Firms Prefer Building Their Own ML Components, Despite A Solution Available In Market

Machine learning has become an important cog in the wheel for the functioning of all major companies. Many companies are now building their own machine learning platform. These platforms leverage open source technologies; however, a few functions need customised solutions. For that, companies are investing in building in-house components for their machine learning platform. In this article, we take a look at a few of them.
Introduced in 2017, Uber’s Michelangelo has been in the works for two years. The goal behind building a proprietary ML-as-a-service platform is to make AI…

Amazon brings RStudio to Amazon SageMaker for machine learning • DEVCLASS

Amazon has added the RStudio development environment for the R language to its SageMaker machine learning service, claiming it as the first integrated development environment (IDE) in the cloud for data scientists working on machine learning with R.

The open source R language has long been a tool of choice for statisticians, quantitative analysts, data scientists, and machine learning engineers, according to Amazon. RStudio is one of the most popular development environments among R developers for machine learning and data science projects.

Now, in partnership…

Databricks unifies data science and engineering with a federated data mesh

Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more

During its online Data + AI Summit conference, Databricks today unveiled Databricks Machine Learning, a platform that lets data science teams build AI models based on the AutoML framework.
The offering follows yesterday’s launch of an open source Delta Sharing project that lets organizations employ a protocol to securely share data across disparate data warehouses and data lakes in real time.
The Delta Sharing project has been…

How to succeed around data science projects

Denise Gosnell, chief data officer at DataStax, discussed how preparation, process and open source can help to ensure success from data science projects It’s important to set out how your projects will support overall business goals.
For businesses, investment in machine learning, artificial intelligence (AI) and data science is growing. There is huge potential around data science to create new insights and services for internal and external customers. However, this investment can be wasted if data science projects don’t…

Excel, Python, and the future of data science

The world of data science is awash in open source: PyTorch, TensorFlow, Python, R, and much more. But the most widely used tool in data science isn’t open source, and it’s usually not even considered a data science tool at all.It’s Excel, and it’s running on your laptop.Excel is “the most successful programming system in the history of homo sapiens,” says Anaconda CEO Peter Wang in an interview “because regular ‘muggles’ can take this tool…put their data in it…ask their questions…[and] model things.” In short, it’s easy to be…

2022 State of Data Engineering: Emerging Challenges with Data Security & Quality

The 2022 Data Engineering Survey, from our friends over at Immuta, examined the changing landscape of data engineering and operations challenges, tools, and opportunities. The modern data engineering technology market is dynamic, driven by the tectonic shift from on-premise databases and BI tools to modern, cloud-based data platforms built on lakehouse architectures.

More than the on-premises market that preceded it, the cloud data technology market is evolving rapidly, and spans a vast set of open source and commercial data technologies, tools, and products. At the same time,…

Microsoft open-sources SynapseML for developing AI pipelines

Microsoft today announced the release of SynapseML (previously MMLSpark), an open source library designed to simplify the creation of machine learning pipelines. With SynapseML, developers can build “scalable and intelligent” systems for solving challenges across domains, including text analytics, translation, and speech processing, Microsoft says.
“Over the past five years, we have worked to improve and stabilize the SynapseML library for production workloads. Developers who use Azure Synapse Analytics will be pleased to learn that SynapseML is now generally available on this…

Building interpretable forecasting and nowcasting models: An overview to DeepXF

Hello, friends. In this blog post, we will quickly peek through the package “Deep-XF” that is useful for forecasting, nowcasting, uni/multivariate time-series data analysis, filtering noise from time-series signals, comparing two input ts signals, etc. The USP of this package is its bunch of add-on utility helper functions, and the model explainability module that can be used for explaining model results, be it the forecasting/nowcasting problem.


DeepXF is an open source, low-code python library for forecasting and nowcasting problems. DeepXF helps in designing complex…

CloudQuery gets $3.5M seed to build open source cloud infrastructure visibility tool

As developers push code across multiple clouds, figuring out what infrastructure you own becomes a real challenge, usually involving writing custom scripts. CloudQuery, an early-stage startup, wanted to make it easier, and built an open source tool to do the work for you. Today, the company announced a $3.5 million seed round led by Boldstart Ventures with help from Work-Bench, Mango Capital and Haystack.CloudQuery CEO and co-founder Yevgeny Pats said that on a high level, “We are an open source, cloud assets inventory [tool] powered by SQL.” He says that when he sold his previous startup,…

Forecasting with Machine Learning Models

mlforecast makes forecasting with machine learning fast & easyBy Nixtla Team.TL;DR: We introduce mlforecast, an open source framework from Nixtla that makes the use of machine learning models in time series forecasting tasks fast and easy. It allows you to focus on the model and features instead of implementation details. With mlforecast you can make experiments in an esasier way and it has a built-in backtesting functionality to help you find the best performing model.You can use…

How to Migrate Your Data from Redshift to Snowflake

Get your data out of Redshift and into Snowflake easily with open source data integrationImage by authorFor decades, data warehousing solutions have been the backbone of enterprise reporting and business intelligence. But, in recent years, cloud-based data warehouses like Amazon Redshift and Snowflake have become extremely popular. So, why would someone want to migrate from one cloud-based data warehouse to another?The answer is simple: More scale and flexibility. With Snowflake, users can…

Open Source Is Finally Coming to Financial Services

Open source will catalyze the financial services industry’s biggest evolution to date. This evolution will shift the power in this $25 trillion industry from business executives to developers, not just in fintech companies, but in centuries-old incumbents, as well.   
Until very recently, financial services were notoriously hard and expensive to build. Incumbents and startups alike grappled with extensive regulation, inflexible core systems, complex payment architectures, compliance…

Quick, Easy, and Flexible Data Model Diagrams

Click to learn more about author Thomas Frisendal.

Many of us have a lot to do. And we have short delivery cycles, sprints, and a lot of peers to share data models with. In search of something lightweight, which is quick and easy, and may be produced (or consumed) by other programs?

Stay with us on a short, but inspiring, sprint through just such a tool.

It is open source, it is based on a GPL (GNU) license or also LGPL, Apache, Eclipse, and MIT. There is a free online server, and/or you can download and install to your happenstance operating system. (If you need to do large diagrams, do it offline.)

The name is PlantUML. But do not let that mislead you. You can use it without thinking in UML Class diagrams. Phew! It supports a lot of UML: Sequence, Usecase, Class, Object, Activity, Component, Deployment, State, and Timing.… Read more...

3D Printed VTOL Craft Can Land And Recharge Itself, And Team Up With Other Drones

For a long time fixed wing VTOL drones were tricky to work with, but with the availability of open source flight control and autopilot software this has changed. To make experimentation even easier, [Stephen Carlson] and other researchers from the RoboWork Lab at the University of Nevada created the MiniHawk, a 3D printed VTOL aircraft for use a test bed for various research projects.

Some of these project include creating a longer wingspan aircraft by combining multiple MiniHawks in mid-flight with magnetic wing-tip mounts, or “migratory behaviors“. The latter is a rather interesting idea, which involves letting the craft land in any suitable location, and recharging using wing mounted solar panels before continuing with the next leg of the mission.… Read more...

5 Things I’ve Learned as an Open Source Machine Learning Framework Creator

If you’re an aspiring creator or maintainer of open source machine learning frameworks, you might find these tips helpful.Photo taken by author (Bryce Canyon UT on my 2021 Road Trip)

Creating a successful open source project is difficult especially in the data science/machine learning/deep learning space. A large number of open source projects never get used and are quickly abandoned. As the creator of Flow Forecast, an open source deep learning for time series forecasting framework, I’ve had my fair share of both successes and pitfalls. Here is a compilation of the tips I have for aspiring creators/maintainers of open source machine learning frameworks.

Documentation, documentation, and documentation

Having good documentation, tutorials, and getting started guides is probably one of the most important aspects for any open source framework.… Read more...

The Ultimate BRRRT Simulator: Fully Featured A-10 Warthog Cockpit

The Fairchild Republic A-10 “Warthog” with its 30 mm rotary cannon has captured the imagination of friendly soldiers and military aviation enthusiasts on the ground for as long as it’s been flying. One such enthusiast created the Warthog Project, a fully functional A-10 cockpit for Digital Combat Simulator, that’s almost an exact copy of the real thing.

It started as a four monitor gaming cockpit, with a Thrustmaster Warthog H.O.T.A.S. The first physical instrument panels were fuel and electrical panels bought through eBay, and over time more and more panels were added and eventually moved to dedicated left and right side units. All the panels communicate with the main PC over USB, either using Arduinos or purpose-made gaming interface boards.… Read more...

GitHub Desktop for Data Scientists

By Drew Seewald, Data Scientist

Version control is important for collaborating on code, sharing it with others, being able to view old versions of the code, and even deploying the code automatically. It can be a bit confusing at first, but is well worth your time, especially if you work in the open source space or on a team where you will frequently be using version control for projects. Here are some of the biggest features that make it worth using:

  • Storing file change history with comments
  • Organizing multiple users editing the same project simultaneously
  • Facilitating code review procedures
  • Automating workflows to report issues, request improvements, and deploy code

Relax, you don’t have to use the command line
Photo by Dennis van Dalen on Unsplash

Version Control Features

One of the main features of version control is the file change history for every file in the repository.


Transform speech into knowledge with Huggingface/Facebook AI and expert.ai

Speech2Data is a blend of open source and free-to-use AI models and technologies powered by Huggingface, Facebook AI and expert.ai. Learn more here.

Sponsored Post.

Verba volant, scripta manent

(Words are fleeting, text persists)

Hearing isn’t understanding. Studies show that if we listen to a 10-minute speech, on average, we retain 50% of it initially, 25% after 48 hours, and only 10% after a week. In short, we hold on to a very limited amount of what we hear. It’s not difficult to see how this cognitive limit can cause inefficiencies in scenarios where listening is key. Additionally, personal biases have an impact on what we grasp from verbal communication. Not to mention all the distractions that constantly interfere with our ability to focus on a conversation.


TICO Robot Plays Tic-Tac-Toe By Drawing On A Tiny Whiteboard

Tic-tac-toe (or “Noughts and Crosses”) is a game simple enough to implement in any computer system: indeed it’s often used in beginner’s programming courses. A more challenging project, and arguably more interesting and useful, is to make some kind of hardware that can play it in real life. [mircemk] built a simple yet elegant machine that can play tic-tac-toe against a human player in a way that looks quite similar to the way humans play against one another: by drawing.

The robot’s design and programming were developed at PlayRobotics, who named the project TICO. The mechanical parts are available as STL files, to be printed by any 3D printer, and a comprehensive manual explains how to assemble and program the whole thing.


3D Printed Research Robotics Platform Runs Remotely

The Open Dynamic Robot Initiative Group is a collaboration between five robotics-orientated research groups, based in three countries, with the aim to build an Open Source robotics platform based around the torque-control method. Leveraging 3D printing, a few custom PCBs, and off-the-shelf parts, there is a low-barrier to entry and much lower cost compared to similar robots.

The eagle-eyed will note that this is only a development platform, and all of the higher level control is off-machine, hosted by a separate PC. What’s interesting here, is just how low-level the robot actually is. The motion hardware is purely a few BLDC motors driven by field-orientated control (FOC) driver units, a wireless controller and some batteries.


DataOps Highlights the Need for Automated ETL Testing (Part 2)

Click to learn more about author Wayne Yaddow.

DataOps, which focuses on automated tools throughout the ETL development cycle, responds to a huge challenge for data integration and ETL projects in general. ETL projects are increasingly based on agile processes and automated testing.

ETL (i.e., extract, transform, load) projects are often devoid of automated testing. The lack of automated testing is usually due to 1) critical ETL testing functions that are not available on the market or open source, 2) the complexity of some ETL testing tools, or 3) the high cost of developing tools in-house.

Part 1 in this two-part series described what makes DataOps processes valuable for ETL projects and a driving force for ETL testing automation. … Read more...

Open Source Autopilot For Cheap Trolling Motors

Quiet electric trolling motors are great for gliding into your favorite fishing spot but require constant correction if wind and water currents are at play. As an alternative to expensive commercial GPS-guided trolling motors, [AlexAsplund] created Vanchor, an open source system for adding autopilot to a cheap trolling motor.

To autonomously control an off-the-shelf trolling motor, [Alex] designed a 3D printed steering unit powered by a stepper motor to attach to the original transom mount over the motor’s vertical shaft. A collar screwed to the shaft locks the motor into the steering unit when the motor is lowered. The main controller is a Raspberry Pi, which hosts a WiFi hotspot and web server for control and configuration using a smartphone.